tag:blogger.com,1999:blog-77974312057255717622024-03-09T10:36:12.428-08:00Geo tips & tricksOSGeo, GDAL and other GIS fun.
Find me on <a href="http://twitter.com/evenrouault">Twitter</a>, <a href="http://github.com/rouault">GitHub</a> or on my company <a href="http://www.spatialys.com">Spatialys</a> websiteEven Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.comBlogger47125tag:blogger.com,1999:blog-7797431205725571762.post-12870174582783311822021-09-26T12:16:00.001-07:002021-09-26T14:16:12.342-07:00Order of GeoJSON properties and graph theory<p style="text-align: justify;">All this story started with this <a href="https://github.com/qgis/QGIS/issues/45139">QGIS bug</a> about a crash when a user adds a new field to a GeoJSON file. Who is the culprit between GDAL and QGIS could be debated. Let's look at the GDAL side of things.</p><p>Let's consider this simple GeoJSON file: <br /></p><p>{ "type": "FeatureCollection", "features": [<br /> { "type": "Feature", "geometry": null, "properties": { "A": "a", "C": "c" },<br /> { "type": "Feature", "geometry": null, "properties": { "A": "a", "B": "b", "C": "c" }<br />]} </p><p style="text-align: justify;">GDAL versions up to 3.3.x will expose a layer definition with the fields in the following order, [A, C, B], and this is what confused QGIS, since the C field was appended as the last action, and QGIS expected it to be the last field.<br /></p><p style="text-align: justify;">One could also argue that we should not worry about the order of properties since a JSON <i>object</i> such as the "properties" one is unordered. However the JSON parser and writer used by GDAL has the property of respecting the order of appearance of the keys which makes GeoJSON a format that behaves similarly to other formats such as Shapefile, GeoPackage or CSV where the field order is fixed. But in a GeoJSON file where there is no declaration of properties (no "header"), and when properties are omitted in some features, presenting a consistent field order is more challenging.</p><p style="text-align: justify;">How does current GDAL in current proceeds to build the list of fields ? That's very simple: iterate over features, for each feature, iterate over its properties, and as soon as a new property is found, append it to the list of properties of the layers. So on our above example, after the first feature, we have [ A, C ]. And when parsing the second feature, we append B to it, hence [A, C, B ]. We can also notice that the order in which features appear in the collections influences the order of properties exposed in the layer definition.<br /></p><p style="text-align: justify;">How to fix that ? For that very simple case, we could say "B appears after A, which is already in our list, and before C, which is already in our list and was just after A, so let's insert it in between". Great ! Problem solved ? Unfortunately not.</p><p style="text-align: justify;">What if you see features with the following properties:</p><p style="text-align: justify;">- [A, D], ==> current consolitated list [A, D]<br /></p><p style="text-align: justify;">- then [A, B, D], ==> current consolitated list [A, B, D]</p><p style="text-align: justify;">- then [C, D], ==> hum C is before D, but where relatively to A and B ??<br /></p><p style="text-align: justify;">- then [A, C], ==> we know now that C is after A. But before or after B ?<br /></p><p style="text-align: justify;">- then [B, C], ==> ok, before B. So final list is [A, B, C, D]<br /></p><p style="text-align: justify;">Our simple logic of insertion no longer works here, and we can know feel that the problem is much more involved than what it initially looked like. We could also have situations where the dataset is so sparse that the relative order of some fields is indeterminate. Or where fields appear in non consistent order in several features.<br /></p><p style="text-align: justify;">My initial though was to use a sorting algorithm, such as std::sort in the standard C++ library, with a custom comparison function that would use the (i, j) apparition order ( (i, j) means that field i immediately appears before j in a feature ). But there are several issues:</p><p style="text-align: justify;">- the sort algorithm may invoke the comparison function with any pair of field names, but we have only a knowledge of a subset of those. If we build a graph where the nodes are the field names and a directed edge between i and j means that field i immediately appears before j in a feature, then we can know if there's a path between any tuple of nodes by using standard graph algorithm.<br /></p><p style="text-align: justify;">- the comparison function of std::sort implies that we have a <a href="https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings">strict weak ordering</a>. When there's a path in the graph between 2 nodes, the ordering is clear. We note immediately that the graph should not have cycles, otherwise we would violate the asymmetry property of the strict weak ordering. So when building the graph, we should reject any new edge i->j that would result in a cycle, which boils down to finding if the reverse path j->i already exists. But what to do for nodes that aren't related by a path ? We could decide that they are incomparable. However the "transitivity of incomparibility" property could then be violated. For example if there's no path between A and B, and between B and C (for example B is an isolated node), there might be a path between A and C. So it is far from obvious that we can define a comparison function that respects the strict weak ordering. And if it failed to respect it, then the std::sort() execution is undefined behaviour, meaning that anything can happen: at best, a wrong result; at worse, a crash, infinite loop, etc. Look at this <a href="https://medium.com/@shiansu/strict-weak-ordering-and-the-c-stl-f7dcfa4d4e07">post</a> for other examples of comparision function that subtely violate strict weak ordering<br /></p><p style="text-align: justify;">At the end, we want to somewhat "linearize" our graph, by respecting the directed edge.</p><p style="text-align: justify;">So given: <br /></p><p style="text-align: center;"><img alt="" height="257" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYUAAADyCAYAAACic9mFAABV5klEQVR4nO2dB1wT2RaHEwihdxBQqiigiIAFpCiKIIIKig0r6CpYsfeCYMG6KnZcFbuogGJBLAgqXaUIShGpSpHee16O+7KLLlJnMkm43+/lrcAwc0hm7v+ecs+l0Gg0EgKB4Cy+FVaqJGeX6CamF+ql55YPKCyrkauorhejv8SrahtEKNxc9VKi/Lm8PNy18N/+CuJxyrIiSVp9pcNV5EQ+EW0/gjgoRBuAQCCwAQb/R+Fp9k+jM+1q65sEtPpKhdNfERMN+16WFOHLExbgLREW4CkVEaCW1DU084FQ1Dc28YGAZOaVq4cn5lp4BSRurqxpEBs3XOmWiY78veEaskFE/10I5oJEAYFgY+obmvhuPE9a4/sq1Qlm/xZ6yjePO5tatjfbp9I9BBAI+Le6gnhMy5+BSICwnPKLcy8si5AbTReH2WYaR3tLCaXj+KcgWAQkCggEG8IQA3jBoN0RIegoMPg7WGq6wwsEgu49bLFzfRQPgrOA/j0kDpwNEgUEgs24G5Ky9KRP7H7z4UreXlss9PEcpOHcW+fpO4JAgAA5uAdG2o7qd3aJjfZOvK6JIBYkCggEmwCzdvdrUWcrahrELm2xGKEiJ8q0hDCIw3q7Yc5TTdTO/On99pjtdv+Uk6tNLZDXwHkgUUAg2ACI8R++9dYDYvswayfKDghRHVluYuMflrYAvAa7serHF1oN2kuUPQjsQaKAQLA4EC46/+CDywn6zPzXpDARQJJ6Gt1j0OorHbHhdIhfZU2DqPNU3Y1E24XABiQKCAQLs/dqpGdiepHepc0WBqwWqgGBuus2SWPF8aDANSeDHxxwGjkdBINouxDdA4kCAsGigCAkfCkc4blh3ChG+SirASJwcpWpBYgCvE6tGWtOtE2I7oFEAYFgQaDSBzwEVhYEBiAMR1eMngSiAHkPSEgTbROi6yBRQCBYDEgqX36SuOn6DitdVhcEBgxhmL37cezZ+3FuqGSVfUGigECwEFB26uHz/qCHs6kl9CQi2p7OAMLg4TzG0vHw8xBIQhtp9X5MtE2IzoNEAYFgIXZ5hXvNNhtwVF2R+CqjrgDJ8HUzh64+4v322GBVyzB28XQQ/4JEAYFgESCPAP+FtQhE29Idxugq+AbHZE8Gj2fbPH1Hou1BdA4kCggECwAdS6HHELStINoWLFhvN9x5ltvD+OikPFPUaZW9QKKAQLAAIAhTTfqfYbW1CF0FwkYQBjv/8IMLEgX2AokCAkEw4CVAxZHPbms1om3Bkml0kYMqqg9fCg1gbwei7UF0DCQKCATBMLwETkvKQjWS/XjNAxcfJ2yFclWi7UF0DCQKCATBwAY50C6CaDvwALyFS3RRAG+I3UpseypIFBAIAnkZk207VE0mmFNyCb8C3oLx4D6PXr7Ptp0+Ru0U0fYg2geJAgJBIBGJueNgL2Si7cATqxEqV076xu5HosAeIFFAIAgCttR8EJq2IPDINBlMT1x2TcpGdl6sfy2pz7/fJJPIXNw0qoBYdS+Ffjlaw0e9s7Cdc2/OxMEPJLlJuHY21VaVDsvMK1dPzi7RZYXW34i2QaKAQBBEXNp3w0GqUhG4JZjJfDRZDc3PfQRI5T++pjVy1ZTli2SnRqo8/hSh/vjq4dk7B83+tO/iiWVLh4mFkOlH4GEGhJBgf+f4z98NkSiwPkgUEAiCSEwv0ldXkMBvkORWbJjvFbHqgB4loOW3myuzhKIeeVn86XZ4i8+Ha0NXmH7yz3rwbL67ifh9vIRBTlIwM6ugoj8e50ZgCxIFBIIgkrNLdIaq9wpm9nW5hBQrR8zc6XN7ku2T0zPHnXN+9G724blrThjEXvpgI0lOw+OaBoN6P9l1MewyHudGYAsSBQSCIJKzinVnjVU/TpgBAoOqlnqdXROlYzv48tebg1zPrF86cfugDdw4eAu9JYXSIa+A9XkR2INEAYEgCKjdV5ET+0ikDWRJq8K1f2heuu4afyTBx9fi45ZBe7S4SaVYXwfyJkIC1DK0XoH1QaKAQBBETV2jEPGrmCm0AWPHhMrvia/OTI5WfF9Gk9GSIONiE5XCVQsVV3icG4EdSBQQCAKoqG4QE+DjqSDaDoBbsW+BIjepOqPxm3BOXrMMSYI7GY/rSIkJ5IKnwKkL9TgFJAoIBAFUVNeJCwtQS4m2AyBTqQ1UMqmJ1NxArm+g8eB1HSoPV20d8hRYHiQKCAQBQO0+q4RSmstLBcuaSVQSl1CziBBXFV7Xqatv4uel/914nR+BDUgUEAgC4OWh1LDKrLk2KVEprYkkRBbuW6Ham+sbXtdpaGzm5aVSavA6PwIbkCggEAQACeaGhiZe8BaohM6eq7heP3o5rrSZTOXVG/lJj4/0Ha8rQT5BUoQvD6/zI7ABiQICQRCMEk0iE6+Nny+qHLz1bVYzWaTZfI6NjxwZvz5IldX1oiIC1BK8zo/ABiQKCARByEkKZkCTOKJEgVb8SmLnnG1ngitIfXi1VyTumCl7Ha82F+m55QOEBaklxHpFiI6ARAGBIAgNRYn3hKzyrcvjfXvvzAS3bYdcH6bVDCJLmha5X9m+bDiOoaMPX74bDFKRisTr/AjsQKKAQBCErlqvV8+iM2fidoGmLJ4rDiOOvxAg7f77GzRyU00JX056plJRTbMgjcRFEtGa/+nPa6cW/TGYLww3O+hk5ZerqciJfMLzGghsQKKAQBAEdEg9custfr2PaLXkvE/v+v+b2SWTuHj4GkWk1QoNzE1CrGb+cWfR9OG3ZXhI1bjZ8H+ik/JNV9jqbMb7Oojug0QBgSAImDlDH6APXwoNtPpKhWN2YtG5hfdr5spjdr5uAsn0vOIqRdhsh2hbEO2DRAGBIBDYfOZlTPYUTEWBxQiOyZ5sOKj3E5RkZg+QKCAQBGKio3Bv5fEXgc5TdTcSbQtePH2babd4opYr0XYgOgYSBQSCQCCEJCnCn0f3FmzH6Cr4Em0P1kApKlRYodAR+4BEAYEgGLux6scvP/m4iRNF4Upg4qZZYzWOo9AR+4BEAYEgmLFDFH2u0EUhOinPdLiGbBDR9mDFt8JKlbCEb+PXzhi2mmhbEB0HiQICQTAwi7a31Nx/0jd2/+Wt4/WItgcrzj/84DJlZD9P4jcSQnQGJAoIBAtgqqvge/N50uo7L1OWTx+jdopoe7oLeD3vkvNHX98xQYdoWxCdA4kCAsECgLewdZ6+k+OhZyFGWr0fs/PuZLCr3O7LERc3zdZbhrwE9gOJAgLBIqgriMcssNR033kx7OpfG8cZE21PVznnH+c2YqDcUxA3om1BdB4kCggEQcBeCqEJ36yeRmfOdJw0eBeUp8420zgalvhtvIdPzEF2XLvwIDRtQcTH3HGXNo8fQbQtiK6BRAGBYCIthSAiMdeisqZedNFELTdGszgII+1bbGy3cH9guJyEYCY75RdCP3yz+vP2+6PXd1jqorAR+4JEAYHAmdaEgPEzTRWpKCfrwS4tj4d+SGfXmY1ZcSwoUEqMP5cd1i/AvhCuXuFeYDc750MQSBQQCFxoSwgY8HBz1bs4jHBo7fdhYN0yV28JDLTlVfXiNsaqF3A3uotApdG286G31s4culpdUTyGaHsQ3QOJAgKBER0RgpasmKq7WUVO9Ld7DMBCthOrxlqsOxV8P7eoUmmJjfZO7K3uHpBDgJCRx6oxlpzc1K8ngUQBgegmsHL3hG/s/o4IAQPYYAeSyu0dB7mGk6tNLdaffuVXVF4r6zx1yEZWiNeDAEIy/FVcjvXFzRYGaAMdzgGJAgLRTSAHkJpTot1RQRDip5ZBMrmj54dQktcWixF7rkSet98XELVu5rDVRJZ7wv4Prl7hl/r1EfsAggB/P1G2ILAHiQIC0U2gYmjvIuNZC9yfRDQ0NVPbOx5yBZ0dSOEabn8YzoMKnyPeb48Fx8pMhjUNzEzqgndw1j/eze/VZ8eNs4ctt9RXud7e78AGO0g02AskCggEBkCCFQZ7t8sRF9s6zmyo0p1xw5VudfU64CEM15gQdPjWW49Zbo/j4Fx4iwOIwY3nSWvgNVpH/h6UnHbkelCRtPJYUCC8L+xQQYX4GyQKCARGTDJSvRSWkGv5/F3m9NZ+LiHCl791nr5jd6/z/5YYjg50MbgbkrrUwT0wEpK85sOVvMfrKd/o7vkBEALYV/lh+Bf70A9frejnvem1xUK/M+IDK7T3Ljayc78WdRb6IMFiPNRCm/VBooBAYMi00f1PB8dl2zQ2/jeMtNPeYCGWSWIYoGGghYT10+hMO99XqU5Hbr09bjpE0UdTRTJKq690eGcSwLAhTkZe2YD3KQWjQmKzJ1O4ueqt6UK30lZnc1c9EaigurzVUm/nxdCry48GPXNdaDAfrWNgbZAoIBAYAeGV8w8+uKyeNnT98TvvDrfML8w0VT+BV3IYYvYgDPCCSqjg2JzJMakFo7wCEjfnFVcpKcmIJMNALCJILZEU4cvj5aHU1DU08kPjuvLqevGi8hrZlKwSHSF+njJ1RYkYutcRgeUiNBDCA04jpzO8GhROYm2QKCAQ3QQG162er70rahrEGPF2AV7uSkZ+QU5SMNPJmjlrDODaLUtdwbakrKIhlXTbYBFcSUVtrwa6FwMVUDLigtlC9AEbREVDUfI9nqWuEDYCu0CgIFGOwkmsCxIFBKIbxKd9N9z+V+iNUdry/uvthjkzvt8yv7B3sfEsotYWwHVZaTc38JYGq1qGrTsdfB+Fk1gTJAoIRBdhhIv2LDKa3VpoCAY8w0FyAWil78+AUJ1cZWoB4aQ5uwNidjqMWIjCSawDEgUEopO0Fi5q7TgIjYDHwGz72AEUTmJdkCggEJ3gd+EiRNdA4STWA4kCAtFB2gsXIbrGr+Gkg0tH2rJSHqSngUQBgWiHjoaLEF2nZTgJFrsZDur9BHlixIBEAYFoA2j+tu38m5soXMQcfg4nvXi2bZ6+IxJh5oJEAYH4DShcRAwonEQsSBQQiF9A4SLiQeEk4kCigEC0AIWLWAsUTmI+SBQQiP+DwkWsCQonMRckCogeDwoXsT4onMQ8kCggejQoXMRegAenImf6aZdXuBcKJ+EDEgVEjwWFi9gTEAEIJ118nLAVhZOwB4kCoseBwkXsD4STltho79TqKx2BwknYgkQB0aNA4SLOAoWTsAeJAqLHgMJFnEnLcNKC/YHh8PmicFLXQaKA4HhQuIjz+TWcNG640i34mmi72BEkCgiOBoWLehYonNR9kCggOBYIF530idm/19F4FtrZq+fQMpy05MjzlzvsRyxE4aSOg0QBwXG0DBfd3T1JA80Uex4onNR1kCggOApGuEh/oNzTI3bDbND2jj2bn8NJQc+2zdND4aR2QKKA4BhQuAjRGiic1DmQKCDYnvqGJr5N517fKS6vlUHhIkRroHBSx0GigGBrWoaLDjiNnI7CRYi2QNVJ7YNEAcG23A1JWXrCJ/bArgUGDihchOgonQ0nXXv2ad1c8wFHmGkjkSBRQLAdjHBRSUVdr5s7rbTRTA/RWToaToI81fE77w83NjZTHSw13YmwldkgUUCwFYxw0QgULkJgQFvhpOSsEl0oXIB/e/rH7zLRkb+nIif6iViL8QeJAqJNUnNKB6fnlg34/LVEq6yqXrKkoraXnIRQpqKMSIqshECWhqLEO0lRvnxm2ALhIo+7MQddFxrao3BR54C1GxXVdeKFZTVydXRPi5cupsICvCWwq5mUKH8u0fYRSWvhJA1Fyffb/npzs6GpmQrHwH+3er7xvrx1vB5eExHwgMur68UZnxV8TiKC9M+In6eU8Vnhcd1fQaKA+IfishqZw7ffHU/4UjiisKxWrqGxicr4GZlMbuYik5rp/6U1N9O4mkkkLhKNRibRSD/+xc/LU6nUSzjVdKiizwJLzX1Y2tUyXHTLZcJgFC5qm/Tc8gHhid/GZ+aVq8enfTf8/LVUi5+XUgkDjKQIXx4cA+8pDDoV9EGoqqZBRFZSMFNdUSJmSP9er5TlRD71tJLNX8NJdNGsgfev5THwPnr4xBzEol0KvP9x9M8mMb1IPzm7RCeDPvHKzC9XF+TjKRcWoNIFgFrKS+Wuqaiii0RNg1h5VZ04hLAGqUpFqCtIxPTrI/ZhuIZMEB7PAhKFHg4IwZ6rkX9FJeWb1tU3CnCRyU29JAS+6g+UfWpt3O/iGB15v7Z+P6+oSsn3dapjRGKeRVZBudppv9i99NceaTH+3FnmA4/OM9c43B37UHVRxwAhCIxKn/U0OtMOvjbS6vMYBo4JBipXBqtKh7X1uzBAfS2sUvnw5bsBfZDS832V6kQXC7Fxesq36B6Zn1ZfqXDm/BXEA+GkpCyVIWfvx+9u7ee+IalOEEbqimjC+xya8M2K/hnNjEjMtRAX5i2Az4l+vvsOlpr71RXEY9r6ffAgkrKKhoCQRHzMHXf6XuxeCWG+AttR/c+NGaLgi5XHh0Shh5JXWKW43CPoWVZeuRoIwUAVyehVU3U36vTv9boz54EZ5rLJOtuWTSZtg68r6YPJlWef1tMfniUed94dOnn3/X6rESrXXBYYOHTWRkZ1EcR5IRHY2d/vCYR++GZ1/uEHF5hJjtZVuOfuNHJme4PLr4DQqtC9A3hZG6lehO+ByITEZk/edzXyHHgUjtaDd43XU76Bz1/BOkAe4drTpPW/+zmEkcCTuLzVUq+j4RwQA/+wtAVXnnzcROHmqqe/x5dW2ups7uwsH64HYsQQJIa38TgifT7cA0PVZIJnmakfa28S0B5MEYXcwkrlhxHp9l+/V6qAu0q/gSXoLpEolcJVJy7CV8BL4a6DwUWnv8xr40Fyj9FsED9aioEAH0/F5jl6y6aa9D+D1fmF6G7vMhvt7fACL2Tt6WD/RxFf7B9Hps8bN0zl5u5FBnM7cp6LjxO2PX+bNd1ry3h9GKywso9TeBmTbXuZPsiAGOAxYP8tEpqfoOIGhOfWi6RVkGxdPEnL1VJf5TqW12IVYCYOeYTKmnrRto7LLqjof/hWtIfrQsP5bR3XUgw0VaQijzubWmJ5L8M4yRAJxrW2/xV6Q0lGJHnFVN3NnZ0cMMBFFBLSi/UuPIzfmZRdogsDA41E46YPFiUiArzFgnyUCjFB3iJZCZ5sOLa4vK5XQ2MTb3xaocGdlynLGxubeXh5uWsgy286RPHugh5SBsYM9l6JOH8vNO0Pfh5KtetCo/lWI5Sv4nk9CVH+fK8tlvpwD6w78+rek6gvc94k5Ew4v95iVD95kQ9t/e700eqnoDYcTRB+BhLFrpfCvb4WVqowa/YOIRV4QSjvhG/Mft+Qz0tcFxrM57TcDgz0v+YRfgd9dj5v3HDlW7/brAlyOTBAgxicXWc2Bu/3Cp6TaSZqZ6wNVS+BOKw7FXzfQLP3kw12w5w7+wxhJgpFZbUyh25GnYr4mGdeVdcgoiAtnGY+TPH2aF1FvyH9pV915jwBkV/mhiXkWnkFJG45ez9uj2pvsQ82xn0vzDTVOIGVvT0J8A5m73kcS58JiZsNV/J2X2xsx8zrgzhc2mxhAOJgu+NB6iy3B3Hj9freaMtrYFalBTvxIDRtwQnf2AMQSjux2tSC2deH3AJU6cCgM8vtcdyKKTpbpo9RO8VsO/AAvAQ5ScEMhV7CqeAJdOR33C6HX7y+w0q3ZSwfZuyQjH72NnPGTnuDhcze4Y8hDhbDVW6CyNm5PorfMldvSWdyIN0WBRjEt18IvfkuOW+0rKRQprWx6kW4Wbo6w4PyxrnjBh6BF3wdk1ow6tLjhG1/3n5/9KRv3P7ZZhpHl07W3t5du3sKL2O+Ttl4JsSHn5e72muLhZ6mimQ0UbaAOAR7zBDZ8Vf4NbrXMDsmNW/kwwNTlIiyh51wuRh2BZ6FE/RBWV2xa2EBLGAMOkPVZIPdr0eeDY7NmXx0hckkdvfoYBIC1UfwgnzK0+gMu8CojFltCQT02tp7NdLz6IrRk+Br8OKcj78MUJQRTr3rZq1B5MQGrg3hLQgzunqFe0006OvV0V5PXRYFSChuPPva521y/hj6m5B8br35aN3+vTrsEXQUOKfuKtNXcL2Dt6JPXAn8uBFWGTpZa+1kCAeidcDTOnUvdl/f3qKJ3rsmDiLaHgbgIUwyUvFyPh70xGSld4XfHut+IBhE28WKwMxz3amQ+xQKV/31HRN0WMWDgtg4eA2Hb731WHTo2ev9jsYzOCWcBH+bk/VgF3i1JxBv4r9OvP8m7Q8NRYn3G86E+NmN1TgOE1ci7G4NWM8DuQVnj5cB9Y3NfM5TdTe29ztdEoV7bz4vPnA9+qSMuMDXc+vNcBGDX4EEpttCo3kb7YavBHE45Rfnfu9N2iKvzRYG8DO8r89u7L8RfdonOHWpsVbvh0dXjplEtD2/ojdA9vnjA1P6TNjslz1h8/3sR/ttFJAw/AyENJwOPwvuLy8W315SkwjAO9g6T9/x7P04N1j0xYzYObPpiEDsvxZ5VpCfWr5qmu76SUaql4i0tzXgM7mxw0pnzcngB7CnRHueXadEAWYtiw89e/Ups3jo3HEDjnREdbCGIQ4LrbT2rPZ4+dh8nU/B8inaW5DX8C+XAhK3giBMN1U7tdFu2Aqi7fkdIAKP9k9RmLrjQQqIQ/iZ2dT2f6tnAM8aCMIITbmnRDxnnQHCErAwjlOFgUFrAvEg7ItDfnGVYi9xgWxWFAQGIAIQ5lpxPCjQ5WL4FXcn4xm/O7bDogC5g+k7HyTz8HDVXt9uNaS/glgcNuZ2DWVZkeR7+2xUD916e+L43ZhDJZV1MittWfvhYQbPorNmwgIy8BBYWRAYgDD47J6kBqIwds3dwhdHp0kRbRMrACu4wUNgdUFgACETKI+FAefUGlNzds8xtAcIxCTDvl6w0G+pjc62hqYmXijdZXZiuTPAZwIhPxAGSIb/7t7qkCikZpdq27s/iewjJZh+x23SAGxN7R4b7IatHNxPOmzXhbDLiV+Khp9dbzaGaJuIIiu/Qm3r+Tc3+/URjWfFkNHvYHgM4zf65k7d4Z/is9tajWibiASSl7Ceh5HAZBfAY9hHt33NyZAHIAxE24MnENpbf/qVH6zbgMQ70fZ0FBCGfYuN7cCrk5MQzGyteqxdUXifkj9q2Z8vgvQHygYedzadgI+p3cNimNLNXqL8X6HL4eKDT1+d3zhuFNE2EQFduKMEeCmVN10mahNtS2cBYdi72HjWVs83t/56mLBz0cRBbkTbRAQw24z8mDsOkspE29IVoC8QzEQ5fQ+Cc/5xbpoqklHsJAgMoIQWnrUVR188HaYhE/Rr59c2RSG3qFoJEhMmugp+0HcGX1O7ByS7H7hPUZy0xS9rzalg/6PLR1sTbRMzgdllZXW9yF03Gw2ibekq5sOUvO8EJS/3fBC3y3ak6rmelniG2SfUvu9ZZDSbVaqMOgvMRLfM0V/idPhZiNEgaEvNea2mo5PyTF/F5Vizq3ADUJHkZDN4p6tXhJfXFgv9lj/7rShACajdrocf1BTE41hdEBjAGodTa8aa0z2b5yd8Yw72lBwDhI2gImyqSf+zSrJCKUTb0x086V6e4dKb9fP2PXn76MAUBaLtYSaw2IgujLfZvUMpxNsdrbVc6BOV839tHGdMtD1YAsK9+3LERWivza7CzQBWP4fE5EyG0vWWGwj9VhTs3B7HiQpRi6F/OHNMxAbwGKAaycMn9pBOv14hIwf3eUS0TXjjePhZsBA/T9nmOXpLibYFC445j7GCUODLmOwp0KWTaHuYAcw+YXEaO88+WwIDDnRshRp+G2PVC0TbgxWXAhK2DlWXCWZ34QZ+eHVz9ZbY73sSNdGwrxdjZXarovCn97ujxeU1sv7uk5WZaiVGQHlq+reygVs9X98OOGjbh5PXMcSkfh9ZVFYjd3CpiS3RtmAFrGGQlRDMhKRlTxEF6HI532LgAXaffTKAAWfxRC1X+lhyjFNEAVYsQ+vsu7snsW2I9legfNhCT/kmeAuMfSL+IwoZeRXqt4KSVtFnnUvYeUemHQ4Gf7yK+2qz9lTIfc8N5iZE24MXWzxfewvxU0vH6La97wG7sX+JyXSHfQFRUZ/yzEAkiLYHT8BLgEZs1ixc594VYDYtLEgtgVYLnLBTHnRSgAGUncfF1oByYgf3wEgna+2dMCn5jygsPfIsaKCS5FvbUf09iTAQS06tHWs2d8/jd4FvM2dBhRLR9mBNSnaJTlFZrdzBZaM4xktgoKksHi0pypsLnSafHpnai2h78ARaK9uP1zzAibX9DG+B3UUBvAS/V58d77hNZKmSfCwAbwGaLEJoDNYu/CQKD0K/OBRX1Mrcdp2kSZSBWKKmIB47SlvhwcEbUSc5URSOeL87zsvDVdve7mjsCt1bXb7h9GtfSO5xSljlV2CwSckp0T6y3MSGaFvwgBF7h7bb7LyDG5QKj9aVv8dpXgKDqSZqZ6Bi7D+i4OH7/uBoHQU/TnoAdzmMcBi79m7h7ZfJK2aMUT9JtD1YEve5wNBAs3cg0XbgBdyLXFykpoM3o0/u/sOwQ5vzsBvBMdmTDQf1fsKJXgIDCLlA0QA7i8KjiC/z7S00DxBtB15AxRh4DBDK/EcUHoSmOZRV1kvumK+/mEjjsAaSzCMHyz/868GHnZwkCpBgbmqiUTbNHr4c85N/HKvNpxn0vo5E4mr9ADKJi0egUXbIwljPBx52E6RJaZjb8H/kpYW+QCdKvM5PNE/fZtpBiAW/K1Rzpb+4bnThmt+8Z+FxxilZ35UqGripAmIyZcoDhyaNspr5ZOEftl5DpCg5eFlgoqNwb+XxF4Hs0rLjV8Cby8gtHzBcQwb7iqOya1I2svNi/WtJfTr2CxSS9q53R9+6DF5PH7ybsTRltI78vfDE3PH/iMI5//jdRoN6B3Bipc7/vYUiTsotHLvz/ggvlbsGtjHF9UJkLhIXucXNR6ORm+HVUEX5Fnli2ES5R582f0ozc+9PwqVT7toZQ9es9gh+CA3iOG02DYMNJJi1u7mn7u9oLnwtdXChvYfbw/QZNTQSN5mL2iwkJVusJETOryj4KpXwKtPowys/ozMHhqxzOnNrw9GZ/S7xkkhNWNsBs1BhfmppcnaJble3iCQSRk8jXO8/MpUmoajyVZqPVNX2gRSSsiTvdzL9ScTaBMgrQML5hyjAzZlfUi1/cs1YM6wvxAqA0KnJi8fefJa0mlNEITO/XE2xl3Aqvlchk+S2l5385ia08qdvl4WJ7ZtkeWX76/JJtKYvPAcnnPbcmbJMkx+HAcVIq88jMv0J8Hvz2XHmGHUPrM9PJLBJvJKsSDIegw2t9JX4VosJ3gffV5qShAdUTtvodmLzQivPIb0FMn8MKM0V3Blv7uh7uOzYdjLk3fiz88xPVfGE0y7Zyl7kwmHAgc3k4z9/N2RHUfiYUTRcU1kyEteLcCs3LLqd4HhAjxKA63XagNHd9ocowCbpokK8RdB5lCiD8MbOTMNjt1f4X0TbgRXVdQ0iwwbKviDk4qKGpVtfJdgmiSl+ulpG6teccVXxXdMycWNuUiEel+PloVQ/i8qcyWmikJpToj0Ylzh7BdeLbQs2HImpHEMWMy7d+ch/zk5D8YCfZpdcwk3KoxaG/fls7BS9+aOOzb2VseS681bX6aYXgiaKkTFvfd2vj9iHj5lFw7E+LzMAD8dsmOJtou1gBpD3+SEKL99n2+oPlH2G/yUbyeF2vf409C5ZDV/xGl6NKg6dayiAwwzzV8yHKnqDKDwM/+IAW9PhfT08gVAKrYlGdhg3kMDEl1TzMEmur1fLmvuRaA3kiiYSlcSNz5WkJQS+Zn+v7IvP2YnjXXL+aFuT/uewPm9z5hUFd690h0aSCM1s35VDO34VhJZQlOpnHj+xK5YUSBW1mhk2iEouwdoeADoZQ5tpPM6NNylZxTp4hfhYDajY/Cd8tHb6sNW4X7E+QMjlYck/mzvURR3S9iydq7FajJSI96XBRafPVhIehX2Zx+6icP150loyF5lGaMO4ps88z/OaVeGfZEnTIkMq6Ttel+rfRzQ+JO4rx5VspueVDcA+nNJM/vbId1RYDU2GS2Zq/up5KtfaCweRpa0L9t+wXoStHT+jIivy6VtRlTK75YbAS4C8HTvZ3B0gnEl5EpUxm8LN1QDN5PC+YOWVPZYhVSQ5ElWhXpEnuyarKkH05MGvS1fv68OUzWB01WVCIAzBjGvhCYg4F5mMaeVBp6iJFXQfa3z7YTVJnu7vkYxP7tkvSiI14HU5VXmJDy9jcqbgdX6igI3fsa97byDFRsUOraORuQQMzBKMhUgsUVcPgyov/VVeXS/OTrX+haU1cn04dCe51ugtKZhBifqUayYnKZiF/+UKuD3+fLe0nj4x4dXbHXNGfOWHCQ8qFn3x2jMlac+ZrRpcpHK8LTDW6vPo9ovkle0fyTyg8ZuIIG+xiY78fVNdBd+OzEjyi6oVuLnIjfhbRyPl7hVfzuNOWvLvt5rJTU3N3DD1JPNIN5geidj9YhoV11JfZRnhJLopvymPZT7fCitVVhwLChytq3APavC7OttvaGymYj4DpRVzf/1aJQdvl5xq3wxmhGY7ihA/TyksRCRKFOBZg1beBpq9n3R0hzSmeTZNmTxec4edChRoexykaK1O8r3s8IciV3tVSl0DPhvKx4ziYX17iybgcYGWNKe5q55NatInkfhphmtmXRkn+/Cd2IO7DqW53nIukScsvA0od/C2Qbef9GsajUTOyCtXZ5WkekNTM/X528wZ8HLj5qo3oQ807QlESWWtNIXCVc8UA5sbyY3NrTdOpDWVU+Iv/TktYM7Ja5YSJNxmU/Ag02iYF8R0C9i4/Wrgxw3wUuglnNpZgQBvT1yYD/uQG62OXFtH44XKMX4Bvmoy5hfoOrCPc0V1nThR1y+pqOsV9SnZzDsoeSW891CC2Z5AwOfEFBGjf24FqXEqBe0cxsNX0FxLwyt797dHR6H/0b1NtOX98brI3zSSY1yvLcqhkfhJwmal2yZSfSmUHSVTpH1SL30vGfBoV+Cy+sAJPlSMF2P8CvzB/HzclZEfv41jFVFoSUcFAmYvFG5meApkktyO8p9LUuuLqV+iAoZ4eRxY7uHzYWZhzKnBVnJRiWdyo4YtkSB9xMMKWQm6J8tKo9svdEUgfsxAKVzYz0DJAjRBAXINeHmV5ZUirCSldGEn19H/bqLtAEoqaqVBHNoTiLpGJnkKFLX6jaGJk4ksSQVEBKgllLr6Jn4FGeHPuF6pPkjQxa/QDm5QKRuXFyZUEl0QB9F2zFfw8jqSdaAqeI/+zeoJKvYC+K2MZUD/hKvyS2pYfvOWdgSiprGJ1qH9tTGHKlHf13hOhBv95fJx8YFeg/56X1wfzb9qUsCFhaGWRngIe15RlRLW58SLjgoEDDT1jc3YD5BksSYlJfEsLlI+KTcpWbWUZswjTSbVYX6dLkAmk2m8LJiwbUsgYNFdWk2pFtE2MgvI+VCampp5xIR4cakvZ1B9283seSWpD4ncp8lup+7pv5dnc5FUNjpdVz+2bXtSfbTwodPFf9ivl9iKpx0AH5W7trKmXhTv62DJrwIhLsqfX9vQyE90JQf3wMMf50j8lXKiiKRZH3NCM5Nkyd+fhH2sMzGreBgMKFifF2/aEgiYkVVW43Ef8pB0DIZG851/1FQT/mjQ06KFynOkyO14xTRydUG+RKOUbIUIFwm3sGR5VZ24sAAvLiWvWPGrQKgrir+vrW8SINouZgE5H0pjM40iL43nythi7jMHIldAHx0u1T8+b+3P9W+9by/n3NW6u0KWvG2Y+OnUwVnZ6/fvpk/ha/CzhUTi56VUQB7FPzRtIXwNOYa2jieTsV/d2ZKi8hrZzhwPAlFQXPXD0xntfLtsnsXAQ39MGLSHKHHg5Sb9HcZqKqXkN5H4+3NjLwq5RZXKdEmgMT4zoK3PDe/PrLSyTqqzv9NSIPr2Fk1kbGiCvbCTaVKWM0LNRB5n+5c/Vjrs8X7pVLeh6/jaSjhXhQu7jjXzvUCZLnHoxl9/LBjAE4WdPf8C8fnopLyxH758N4Cvf/cZ4vX5lVXVSXTmeBCIiMRcC/j3uHU+BdCjavoYtVN42MYqQM6HAnE++meAX8Q2+6DiyY+NBrB4WmvlmhtyJFKLB0Co2X6T0Zk104OtajK9FNwS9o0+P4gL15galcJdV1hW3Ru2PsTzOh2lsqah07NF+gwmn37DylzaOn4EoW0DKq9I+BWS+v34t7Bu+UBufCrI0r+WDiRxk2ms8plV1zYIdfZ3eOgeHmzjOMGw7+Vh6jIvIXkpIcKX/7WwSgV6A2FpH1lmev7mJQfPBxxM3B13cK7TCp3AD2dtFS+12kCtIZ33ltO8fUcTa41pygJZfXpT8rC05adLNTTzJqYX6nNxkQmpiKLP+AU7+zuwA2B5Tb3EidWmFuzYoqOz/PAUuLnJTcUVdb2UZElJ2F+imRy/69KizGaSANyPSTtV1oq7kJx/OoRWB2U0XCRaPumua9iyM3eMA7Hu/tcSiJnpD5R76uJgsACva3QGx8PPQorLamXaO65l+EFahO+bxQa/PG4yNxOSza1TFrlv6OTxO+6lNZMEISGt4LzrFn0ahkvoISW7RFdMkFpI/8wc8Dh/Z4GS1KD32VPbO641IWj5c3VFiZiMvLIBWIsC3R9uNtjhdcLljbnhztCkCRdnGZ5OW7ZLz2XF9OMjVUU/cf/ofVTJnfHmpv6f23a4nAnNN28SHl6x7cqBleai5Gxsbfmb+LTvhiq9RT+5LjSch8f5O8Jst8dxEMJq77iWzxosujOnewm9JXvGWgVovULh5uJqKCiplsflCo2v+F3uFsz+2xdsJtWVlQq1lfEqfbzL5FH9czkbKukrLvbQqa1rFJQQwqEUEAfaSlRC59LrzxPX4TtQ0ki5e0RXcO8lLfvpu7RmrpYVovz6x99F7ZTejJcV+SXVfUYMlGNCG5bu054QtARaCnz4UjgCl13JBIdVbH30dK6o0/zDW24n2QcfW+wYfHzZYkFJmZJeojylNQU5kvkVDaIQwRHoNyXd7cpfS9YaiDzDo/smkJFXrqGuyLoz7baeNVjQ9a2oUkVdAEf7mzJ4/poxyPN+u11S6VMwwYllJ8IP25hRSZh7ddB6hULl4ar7Xloth/XJgVq/XWOelJMUSSSJ5llR+dNuDKe0ukNYc+oKDQX1UzHfql8L77tSOd9mkZA7HvYAdQ1N/OIi+K/e7iodLWmEzq/JmcVDcDeI1kxqbm3hGJlC45dUL7be6+d+3bH/n9w4DSZAY2MzdfQQBZbdXa4zQtASuijEPQz7Yo+XXWTR4aUrbsUutnX2PveX112HJ2/ej0rKzFfJKuVWFJBULBlqZPzcYqqDn+Pc0dcU+fBdPJqYXqSnqYJzp9FO0tFnjSkdXmn15OLMZPniDhxKFhpSUkEj8eBhRnpu2QCKMD9vSXxaoRH96yPYnr6U6/y+0BW1MKmVnf5t51DKbzt6cvXf8tle+Uyce3q9/rujp+YVLtp0RAqnUERNfaMQ/eaMxuPcXaUri5/of0MUJO1wMWjgizg8F8h0Br/XqY6QjrTUU75OtC2/Ap0/JxmpXoISxq4scIKOlK6Xwi/hu90oL6234fzonfDC5wId4mVMtu0EA5UrBJrwg648awOVJaOh8g+XJLPo3ML7NXPxidR0EigEgEIWimpv0cTsgop+mF8h96i8R3zjSEgXKM3b8qDtNhZ9Gtc6aVw4uPmjflPSmf4HMjfoHVLieoO1SRl5FerNTc1cuv174bIhTFfY5WDg0LsLvVXWzxiyeurOR0mfc8q1+smLfMDDNlbgSuDHDSKC1BJWakgGZZX399n07crn1hIQEjVF8di3yXmm7L6xfVvA/sxkMql5MMGdRg8vGzW5K58ZrFc4cD3qNCfvFQ7QvVaH0ToK9ygjBvUOPHH3/UFsT99MTnI7v+BHEpJbrf6P9Uqe7f2G1PKVvto7lh5435AlfsUtYfmBC4NDsd7s42VMlq2gAA/uPZY6Q1cHFkVZ0WQqD1ft/hsRZ/7aOM4Ya7tYha/fK1XtxmocI9qOlsDAgNXgYKjZ+wmEkDhZFGB/5nHDlW8RbUdXnzUQbwgPhiZ8tRqvp3wDa7tYhcCojFlb5+k7UUZp9X5w+Ga0B6b10o0RfC7eufNgRKcMXpHk3IvU/kxWyKF4nf7q53Pe1E0v8HGxeHXOT3o0hdReK5BOEZOSPxqqCbA8J5EY0gX9dXzOBKLtwAsIHUEt+4opOrgvaiSKiYZ9vTwfxO9iWo8dAngU/mX+oaWjbIm2ozuYDVO6/Ygu3pwqCnD/FVfU9vqxyY6clFAGPy+l6vrzpDULLDWxSfBSDGu8i2n9vDv1S3y02a9rZ8zGxIDWiUv7boDvJunMZcvs4U7BsTl5XgGJWxyw+uxYiBM+sfuh7xErhY6wBoQAWirARlecuDAK9jeWEOYrIDp01F1Mhyj6nPSN2Q+JWGjQSLQ9WANjiO2ovzd8+tE/R6efdGjQ+6xpmIkCC5KSXaJTU9ckNGOMOsc8eLDJjrKMcPLFR4nbOU0UXsZkTamobhA/sXqsBdG24I2j9eBd604G37cxVr3AaQJ4/uEHF/j7iLaju0C4cNpotdM+IalLGavROQXwEp5GZ9r57LZWg69/iMICq0F7lxx+/pJY0/DlwqMP2xRlhFM57aE74Ww6btLW+5nXniWtm2uugXEFGXHsvRp1XlqM/6umsgRLVYrhAVTBwOzz/pu0PzjJWwAvAfqMcUq+ZLbZgKNTd/inwASMk0J9f3sJ/c4y8mQ/RAGqcfj4KFVn7sXtWTpZezuhFuJEWEKulZO1FpFVebggKyWYJS8tmHbaL2Yvp4jC44j0+WWVdZJeWy31iLaFWcBseovna+/x+irXOaXCBcItjpPY30tgAJ8LDJ5HvN8dc3c0ZvsdHAFYnQ+FDg/2T/mnE/E/7ZetRqhcuRWU7MyJonCJroRNzc08c8cN5IhB81cubBxnNH6jb+6ak8EPjq4YPYloe7qLm1f4xb69xRJ7gpfAABJ8kFvw8Hl/cNs8fUei7ekuMPsUFqSWwBoOom3BkoVWg/ZN2/kgCbygju7exsrs8gr3gglJy4nIP6Kw3EZ7u//rz4uuPf24jtMGT68niVss9VWuEm0HXkBuwX685gGvgI+bY1ILjOmeH+ZrPJiF3a6HCdBCw3vXhEFE28JsNtgNc7ZzfRTP7gMOJGNvPE9a47XFQp9oW7AGws877Ecs3P5X6I27btYa7OzVwWdU39DMN9tM42jL7/8jCtA2AZbpX3icsIOTRAG8hNq6Rv5Ns4cvJ9oWPFk+RWcLuIHL/nwRFH5mFpXxfXYqdbz1Itk57Vu55j5HYzuibSECGHC2zNVb4n4t6uxgVcswdhxwoLR92/nQmxB37+7iPlZluIZsEOxWufdqxPn9TiOnE21PV0jOKtEFb6414f5p9y5nW91NUIvr4RNz0Hmq7kbmmYgPcINeeJiw02ZkP46r6miN69stda02++VYrPPJt7fU3P8gNG3BClvdzVJarCcKEMvc7Pnm9qnVY81h8ItNLTQ+4v32+DANmSDzYYqdq2bmIGDAgZDL2pPBD06tMTVnt/t279VITyjo+HX2yWlABZKDe2DE2ftxbktstNkqVwkrszecCfFbaauzqTXh/kkUwFvYMHv4yn1Xo85aG6leYMV9jDvDimNBT6lUrpqtc/SWEG0L3oAAhiZ8sxqiJhsc/SnX7Ojtd39ajVC5yqphCLgZDTXlAjafe3Vn7Yyhax2PPA1RlBFJObPWDJ9+TmwExK2Ts1/rbjr3+g475YgO33rrkZlfoe653syEaFvwBsTaY9UYS8dDz0LkJIUyoZyYaJs6AowTToefBcOaBOjb1dox/9nnd7Jxv/P3Xqctcj4eFOjvPlkZdytxApZsx37+bnx9u5Uu0bbgBXzAQTHZtiGxOTYhMdmToZlVy59/Y/G9jWHwW30i+OEst0dxArw8lT67J6kTbRMrAAPOAaeR01fQn0F28drvhqQsjfiYO+7S5vEj2M276SoQlj2yfLQNfZANgX+z6gSMAYwXMNHoLy8W39a6plY3fz+5aozFuPU++RvOvPJlx+XpRWW1MrsuhV+21Fe51l9BLI5oe7AmOinP1PfVZyfYKvB3+01PNFK9/DA0bb7N1vtp9/fZqDLbxo7wMaNk2Lvk/DEkMpm2xHowW7ngeAMD68lVphYgDJvpD7LbQsN5rDrYgocAgnB2ndkYdsyDdAfYIGnvYiM7yAN9K9RQYdV1JhAyAg8BQnuuCw3nt3Vsq6IAYaQDS0xsN5wJvn/+QYLL4kmD2KY1BKjhdJcHn+SlhdJcFxq0+cezE+m55QMehafZw8rD3HY8AAgbudiPcIDwzLbzb25abvD9FnDItjezbO0I996k/bH3SsR5aXGBb39tMB+54lhQoCp9BgMxdaJtYxUYwgCzOyg33u80ajorDbrwrEEOAUJG4CGwkm3MBO5Z8BhWeQQFFJXXyLBajgHyd84eLwNGacv7d8TrbFUUgJGDez9aMUVn03Gf2EMaShJv4WtsTcUHKOnj5uJquuM2aQDRtmDBOf94VwiFwcbvHTke9v1dbzf8xzJ882FK3kP79wqG5POIJTcadi8ymgvfw9fi9vnj4NNQ2LRES1Uq/OImC0P4HlTdbDsfeuvcejMTTuwt01UYoaSz/vFu9vsCouB9YgXhhEkKTDhg5gk5BFb1YpgFeAwXN1sYbD3/5tbyoy+ewVoTVqi+gmKTE76xBxyttVymmaid6cjv/FYUAChNLa1qkFp/Ovj+4WWjbVhdGCZtuZdRUlEnfX+vTV+ibcEKXvrD1lFBAHbaGyxsOWODNQyP90+Rdz4RHLDV882tWy9SnC9sMjfCxdh2iP6UZ7bK4+WjxmYaz7LJ2tsWWA3ax/gZDHRQDbHuVMj9y1st9XrqrLM1YMCFGZ5WX6kICFNAG2dY00DUQAz17Sd9YvavtRu6uqMDTU8A8grg2cH74+AeGAll4kQloCFctNXztfdXupdwgm5TZ7ZCbVMUgBVTtLfwUrhr154KfuA4cbArK4aSKqvrxaa7PEysa2gSuOM6aaCkKOtut9lZoLQvLu274Zv4rxPbO/Z31UYgDNe2Ww65G/x5yWHv6JN6TjeazYYq3d7naMSU9QB5hVWKyz2CnmXllav1EufP8dtj07+1AQ2qIdLzygdARdKpNWPNmWEbOwE9hIapywYdvhXtAR4xrERlZitn2D3NKyBxMy+Vu+bu7kkarDATZjXgvoYkromOwj0IJ/m+SnVabzdsFaxYZ8b1IaTnH5a2wONuzEFr+vN0YrVppxtKtisKAEMILj7+sC39W+nAfU6s0/fjfUrBqNUeLwPEhHm/399nI8dpbiwjfDB79+PYzLzy31bntAwb/Y5po/udNdXt47f2dLD/83cZM547ZU43HaLgt99p5DTsLSeRPucUa284+/puTkFlPwE+ngrXhUbzrUYot7myHBLOEEOHWDUntHvAGvCgIFEYT58onH/wwcXTP37X4klarpb6KrhtV/okKmM2XAf+zWwhYlcgnHTXbZIGDNAQZlOSEUmGzwmvFuIgBndDUpdefpK4yXBQ7ye3XCYM7qpod0gUABAGyC1sOBN8z3KD3zeYZRK9reWf3u+Oer9MXqlNf6M9N5iPItIWPAFhOLTUZMrC/YHhv6s2+jVs9DvAa/DaYqlfXFYjs+7Mq3tB77Jshztep4Hra22senGpTfd6X8F5d3qFX41N/W5UV98kIMBH6ZAYMGgpgndepixn1WoOooHBBWaBsNUlDNjH7rw/Yj5M6baJjvw9LHIOLQsbKNxc9UgMOg/cyxBeszZUvQTiAK0x4L2EPaIn0b8HwtGd87csSQ9L+GZJn+D5XtpsYdBdD67DogBATuH5kWnSsCjM6fDzlxNG9L3qsmCEQ3cM6Aqw1/Lyo8+fl1bUSm+aPXyZ7aj+7W73ye70kRJM7ysnkhj/pdDw1591ZZEaiAPcQPBvCCvRvcDtFx8lbKW/tsGmS3KSghk6/aRfj9fvewMa07XmgUHpb+SnXItXcTnWsZ8LjMoq6yUbm5p56Dd+g6aKZNSa6UPX0P/b6aZ2PxYGOY+xdDz8LESZ/uCwQmKVVYGwBIgDVJjAAH7KL8796/c3fYfR3zP4DKAtt4ai5Pu2JgwwuCRlFQ9JSC/ST80p0aYPMOMF+XjKYWX1cWdTy+4OXj2dluIA77Pf68+OS448ewnvMdzb/fqIfYD7HCa3bUU6oGUNTAIS0wv1QLTh3/D5WhmoXFk3c+hqrNrZdEoUAChX9do6Xi8wOnPWrothV56/y5w+w1Tt1Epb/BfYZOaVaey9GuUZm1pgrKYoEeO3x7rV2DSnAQ/8+tOv/OguaMoEA5HLj8K/2DN+JsRPLWsvbNQeEFaCFwwOAVEZc/xD0xZm5JYNuPcmbbFvyOclNDKJRP8fvGg//u/H//8LhcJV30tM4NtIbekH9haa+7siBL8Csx0XBwMHVJHUMeD9glg2vOB+gTzUx4zi4SDYqdml2nUNjfwwCAkLUEugeKGO/lnXNzbzFZfXyMBO6PTnKRYERqdfr9fzLTQPICHAHhirwMODFzxr8Bml5pRqf8wsGn7zRdIqKCihT8gq4fNhDPDwOVVU14uXVtZJiQvzfQcRgEKDiYZ9L0MlGh59zTotCgwshivdNNKUCzji/f7YjWdJq71fJK+0Nla9sNRGZzvWlSOv479NOO33fj80S1OUEU4+t8F8NNGhK2YBsz9YHAQPOySd4Wb6VlSpEpNS8CNc5rLAwAGr9xtuWhsj1Qvwavl98AjySqoVP+eUDIZ9XOWlhL4MUpGMhK1csbju70AVSV0DBAJeLfMMcN+U0wcXqEqpqK4TFxbgpYsD14/BpydMrFgNeM/h/v7VC4bPBwS8qLxWtq6+kR8+J7jvmdnUssuiAIDXAOGjdTOHrD7uE3MwIDJ97t3g1OXy0kKpJjoK9+eYa/zZ1T8GhMA7KMk5JrVgVENDE6+qvPgH+oyxx4gBAC0Onr/NnNGypAxupn2Lje3m7H4cM0hFKnK0jvw9vO2Aai54EbG/AapIwgbq/2ef7NIxt6cCAiBMYq4I/Eq3RIEBiANUisALBvErTz5uhFKsa08/roebUUZCMFtDXjxGQUY4VUZSMEtMiK9QRozva11DM29BabU8rNDNL6lSSMkq1s35XqVaWlkrRT8tTVlWNNl+/MADnLR7U0dg1BhX1DSIXd8xQefXGTLcMHsWGc2GWDFBJjIVRkUSeEyctj8uAsFqYCIKLYGZPGM2D+sHXsbkTI74mDs+ObtY921K/pja+kaBxqZmSlMTjYdEJtEoXFwNvLzcNVRu7jpFWeHUiYZ9vYwHyz8c0l+6x3gELYFSQ6hSgCXpbQ2APSn5iiqSEAjmgbkotAQ8iElGfb3ghed1OAXoNHnKN84dvABW77jIbP6pSDqEKpIQCDzBVRQQHQOSgBAaSUgv0ru+w1IXrRRtnR8VSQv+rkiCHaPQ+4RAYA8SBYJpWW56br35aFRh0zaMiiToqooqkhAI7EGiQCDBsTmT912N9GSUmxJtD7sAFUnJ2SW6qCIJgcAeJAoEAXu7wsrGznYwRPwNdA1FFUkIBPYgUWAyLctN77pZa6DwR9dAFUkIBD4gUWAi0KsEOia2V26K6BiMRoErj70IRBVJCAQ2IFFgEowdkKCfDyo3xQ7o0YMqkhAI7ECigDNQbnro1lsPWOmNGrvhA6pIQiCwA4kCjrQsN+3JG5szg38rkl7fObXGFFUkIRBdBIkCToR++Gbl6hXuhcpNmQejIgkaCcK/ibYHgWBHkCjgwDn/eFdoCHhkuYkNs/ZmRfxckSQnIZiJKpIQiM6DRAFDfu5uaqWL2hQzH0ZFklMbu7aBF4eS/QhE6yBRwAhUbso6/KhIcjBwgPCd53pzk5YVSRBauhr4ccPFzRaGyItDIP4LEgUMgHLTw97vju9aYOAwRlfBl2h7ECQSeAJOkwa7MCqSYJcxyDe8if86EX4eGJUxC4kCAvFfkCh0g5blplAjj8pNWQuoSIqne3BrTr58WF3bKASb0jN+5huS6uRkrb0TVYQhED+DRKGLoHJT9gD2KV5x7MXThsZmasvvNzQ1U1/GZNlaG6leJMo2BIIVQaLQBVC5KXvwNDrTbtfFsMsgAK39/GH4F3skCgjEzyBR6CReAYlbbr5IWrV3sZEd6rXDujASym0dE5NSMAoKBFBuAYH4FyQKHQSVm7IP0E7bOyh5ZUeOfRmTPQWJAgLxL0gUOgAqN2UvGKuZOyIMD8O+OCywHLQP5YQQiL9BotAOqNyU/YAFbCDeU03UzrhfjzwLYaLfHVtSUSsdmvDVarye8g1m2ohAsCpIFH4DlJtCXDriY+44VG7KnsAitpOrTC38w9IWnPKNc6+sqRdt7ThoSYJEAYH4GyQKrYDKTTkH8Bqm0T2G0ToK9074xBx4HJE+79djEtIKR8BnjvZiQCCQKPyH6KQ8U9iwBZWbchZQGOC60HD+uOHKtw7ciDqdW1SlxPgZlKzeDUldijqrIhBMEgUIxcSlfTcsLKuRq6xuECutrJMqr64Xh9YDYsJ833kp3LVSYvy5yrKin8DlZ4ZNrXHjedKay08SN6FyU84F2l/cdZukAZ+1p3/8LsYahqdRGXadFYX03PIB4GHAfQ33ONzX8H0BPkqlED+1VESQWkL0PY1AdBZcRAEeEOgtE5aYOz4jt2xAZn65en958Thwz0UEqCUwa5OREMhubGym5hdXK8DxELtPpx+bR5/BDVKVihikIhU5VE0mmBndLFuWm17abGGAwgicDYSUwBM00VG496f322Nw7+WXVCu01T0V7tHQhG9WiemFetA6IzW7VFuQn6e8t5Rgeh/6/UKFic3/y5Tz6Pd0ZU2pVmFpjdxXumjAPa2mKBELpa+aKpJRRoP6PEYhSQSrgpkotBSCkNjsyfDADVXrFUx/+ParK4jHdPQ8MEAnZRUNSUwv0vd8EL8LVg6PG650y0RH/h4es/fkrBLdDWdC/FC5ac8DZvAnVpta3A1JWXrlycdNt14krfpVFOLpHu7NF8mr3qfkmwjy8ZRbG6leWjxRy1VDUfJ9Rwd2hqcM9/SjsC/2bpfCL5noKtwbpd3HH9pw4PLHIRBdpNuiADc8VHfAQ6VJn93TB+/7rgsM7GE21pXzwYMGgz+8YDYH7jm0K3C/FnWWl35OR+vBu7AqDYXzwkKnLXP1lqBy054LJKKtDVUvHbwZfRImJXAPgtdw/uEHl/KqOnEQgpW2Opu76kHCs/DrPR0cmzP5Fl1szj/44AL3NKp+QrAK3RIFxgwLxOC4s6klHrFTeBAZSd+AyIw5EAeG19qZQ1d31XNglJu+isuxhpliZzwZBGcCA/f2+SMWQaHBKb84dxADvAZruKfhfoYXLIxk3NNIHBCsQJdEAWY6W8+H3oJ/n11nNoYZMXh4aG2MVS9Y6itfhzAVeA5D1WWCN9gNc+6MV8IoN1WUEU69vmOCDortIgBGG/TIj7njlk3R2cqswRnyDDAxAXHYdzXyHCyW3DZP3xHltRBE0WlRgKqNs/fj3JbYaO8komQTBAD65FvoKd/cezXS0871UTyEfzriNcCD53z8ZcDiSVquqNwUwQC8g92XIy7qD5R7StREAcThMl2K4Pma5fY4bu2MoWtgEsRsOxCIDosCzKRgEM7Mr1C/5TJhMNEzGRAHqDt/GZNtC17DrLEax9vaqJ1Rgnho2agpqNwUwQBCoCd9YvfvXWw8i+h9m1tWRS058uwlVO2htRMIZtMhUYDk27rTwfclhfnzPdebmXQ1iYwHkCCGWvB1p4LvF5XXyIAH0/LnLctNb+y00iFazBCsAyOvxGr3BeTmHrhPVl5xPChwzcngBwecRk5npWcOwdm0KwowqDodfhYM9dUQ62SGUZ0FHiLPDeYmG8+88q2kD/6M0lJUbor4HS4Xw66A18uqbUxABKBvk8vF8CsgDEdXjJ6EhAHBDNoUBQgZsbogMICFQ1ABtcojKAByHn17i32EctOVtjqbIAdBtH0I1gHui4y8co2Tq8eOY0VBYPB3iNRgPoRtQRhOrRlrTrRNCM6nTVGAm7G/vFg8qwsCA3jADy4dZTvD5WEiFxe5+dTqsebqiqjcFPEvkEN4n1Iw6tx689GsLAgMQBjg+QNRgHAXyjEg8Oa3ogA3IMymoCKCmQZ1F/AY1kwfuvbM/bg95dV14kTbg2AdYEEatNC+vsNSlx0EgQEIA4SPZu9+HCsnIZjZVkEFAtFdWhUFKN18FP5l/l03aw1mG4QFk4z6egkJ8JRDVdLlrZZ67DQAIPABcmNul8MvHlw60paVksodBYQBKqRWHgsKHKYhE4T290DgxX9EAfIIrl7hl3baGyxk58EUqpKCY7Ine/i8P8gu4S8Efhy+Fe1hPkzpNjuXI8PKe0drLRdXrwgv2PiJaHsQnMl/ROGsf7zbQCXJaKJrtrFgvd1wZ/t9AVFtdb9EcD7w+cekFoyChWlE29JdoEdTSEzOZK+AxC2wpoFoexCcx0+iAH3hYSNzn93WakQZhCXg6cDqZWhshkSh53LSN2b/DvsRbO35MoAw0oqpupudDj0Lhj2oOeFvQrAWP4kCzD6mmvTnqBtt7BBFH++gFGfkLfRMYMW7sCC1hJ3DRr8CYaQxQxT8bjz/tMbJerAL0fYgOIt/RAG8BGglzSleAgOYWdmNVT8OezMgUeh5MDrqEm0H1sCeDnN2B8TMNhtwlJMmcQji+UcUwEuwHdXvLCfeYKa6Cr6XAxI3Q+MzTpoxItoGvATGXgZE24I1UEE1Wlf+3vVnH9f+2toFgegO/4jCs7eZM2ArSiKNwQsYGKCranhi7nhOHCAQrRORmDtukmFfjl3NbjVC5QqUXSNRQGDJD1GAGbScpFAmO9ZvdxTY0tPBPTASrQjtOcDeBIFHpskQbQdeaKtKh5VU1PVKzi7RRRtFIbDihyg8f5s1AwZNXK5Qdk3KRnZerH8tqU9HDuebdDko13/+eDESqQFLM0Dw4IUSzj0DCB3BJkz4hUNruDKDbxpcvH5vzvOwWOPkrHzlshqaAI+gWHUvBbUcbYOxkdZzFt6aNVrpuQCJ1ISHBeABw3MLm04hUUBgxQ9RCE34amU/fuABXK9EptIkFFW+SvORqto6jLePaA43iUTDw4TROvL33qXkj0aiwPlAS2wT+ueNx7mbv7+SPvCHg8fuh+nTa2gkbjIXtVlIsleJoiy1oL4kTzTnY6hGZmLoAP8Lh+332+5/dOviisVDRch5eNhiNkzx9p/e744hDxiBFRSoOqqqaRTBPXTErdyw6HaC4wE9SgCu12kDg0G9n+y6GHYZPUCcD4RE7cZqHMf6vLTiIMkN5pPuHo2rHkUW1ymdu2XXsXVzx3lpy/FnkX9MZprIlVlRffzPuy/YeeThus8+qyda1fJfDbu3yFaVQqrA2h4IIWXmlatDJwLUWhuBBZT03LIB/RXE4og2hBn0lhRKhweIaDsQ+AIDZH5xtYKKrAi2/YFopdxPNi3cfBwEQcay4NjTO9NWDhZ8/fNB3DQhRYOc2bvv7TE3XvVmtM0p309P9446HDRzwplxwpiHaEEIlGRFktPzygegEBICCyiJ6UX6g/tKhRNtCDOA+LKspGAmSsxxNnFp3w111Xq9wnrm3PzlkvK+a1nzm7ikmmyPXNy9YrDgm98fzUWTtjgY4h8806yhv95XDUkqLuEjYDDdW4j//N0Q3dMILKDQB0gdEx35+0QbwiyguyR4R+gB4lwycssH9ME8HNpMznroNyqqjibFrTAzZ/lUWR9yu7kv/mbVEcbvsLXjv8jRJzpZBRX98b4OomdAAVebl9JzYpGw30JFVT3aZ4GDqaptEJEU4cN4Zl5Hjo2OH9pAI3MJG4xJ0Ocjfcf2/F1HXJivIDMfhUUR2ECpqKkXE+LAVcy/A0IKdY1NfETbgcCPhsZmKi8PpQbTkzYXcud8rZGjkbhJvfsqZfDSv4Pp+buBiAC1pBxNdBAYQamsbhATEeQtwf1KTRk856dqXPDhJVX//iAqacSuoD+vzpU9075r3jXoM8j80opaaTzOjWANoKKuXx+xD9ietZ5cV0+/QUlkEh8/X21r92dzxlnlKWP2+L9rIEn8+jNulaUZ915um6RLIWH+rMGkDiZ3WJ8X0TOh1NY38jPlSrR6cklOmlzbTwSVpFjWKA5PGxknM2AWWdfQjDwFDqa+EYfyTLJAs5AAuRrWoVVVVArR3QTyr+tpaI2VlIKvX3t/bSBJ/vrr3AJltXXgZuAAL3i/9U3MeY4RHA9FWIBaWscMYaCo1W8MTZxM5DoFBuBuE20DAj+E+KnYz5zJkk3KyqJZXKTvpK/JqaqVtBEUcTKpvuUh3P3Wfw6vXy/V8nuN77Zpa4/Y9yoZU2N+BrYaRfc0AisocDPBTUW0IcwCn9ACgpWAe7qmtlEI27PykIYYDY/kv/CoqSYsQCuobJ7CVDFSGrbX6Bp/5wWppUTbgeAMKFJi/LkwUBJtCLMor64X70mJ9Z6ImBBv4eevpVrYnpVMk5ow6/U4sYBMv6L7Kkc8P/5hvXHgdh4WSDjD8wtVdUTbgeAMKOBql1TU9iLaEGZRVF4jix4gzgYmOhEfc8dhfV6y9LSC7c5HTgW4xRyKcJuzcsPggPgj42W9f9urqymf59WdFxOzm0gCWNvSEli9zcvD1WPKyhH4Qhmi1uvVw7Av9kQbwiwS0gpHQL8You1A4IdWX6lw2GcA+zPz0oZsuXrmYPgYvdXPYmd4TB7h9XGly8jtS21PGvUVTWKIQ31RikTEM5+xl44eW3E9usC4kVu2ydxx+g1NCqkMe5tIpOTsYl3YiQ2PcyN6HhRohPfhSyFHbq7zK9DeAtpcoMZhnA14glBlhktYhVezZqVf8B/ia+y/rrrwbvmzwwuXPTvitFRYWq5IRpy3pKGsQCj/e7l0bRONQiJzkUQ0pqVu9ji2cb1ZH3+8Qk0w0cF+BTeip0KBpmG4PUAsRnJWsa52P+Ql9AToHnDI2+T8MeP1lG9gfnKBgVVzz0Wst1rx4K/Ll7wdHgVHmyakfe2fmUZT4ReXLpMfpP1J29g83HKi7UNb84FPxblJdZjb8H++FVaqCAtSSzh5gywEc6HArBnXB0h0buH9mrnymJ+3CySmF+lpKktGEm0HAn+0+kpFpGSX6OByT/+AmyahNfnTmj8nb1qDzwU6RFjit/GGg3o/IdAEBIfxY5OdiYZ9L/uGpDrh9wCxBpy+PSPiX0x0FO45HX4Wwul7ZzyNzrRD+QQElvwQhWHqskGQmIP1CvhtX0gsT6IyZuO7PSOClVCRE/kEIRXYbGe4hmwQ0fbgAYR8YX8QVDiBwJIfogADJX3AfBkYnT5rmonaGaKNwoNHYV/sTXTx2Z4RwZrA9qshsTmTOVUUgmOyJ0PoCBVOILCEwviH7ah+52CvV04UBdg/ISWnRPvIchMbom1BMA/Y1N7BPTDSwVLTnROLKM4//OByeJnJFKLtQHAW/4gCzKagiuFlTLbtGF0FXyKNwhp4eGaN1TiOZlQ9CwgfgTB4BSRuWW83zJloe7DkbnDKUjV58TitHrJrIoJ5UFp+AQkr8BY4SRSgZO9dcv7oLXP0lxBtC4L5zDbTOAregpO19k5OyiddCfy4ac8io9lE24HgPH4SBYa3cOdlyvLpY9ROEWUUlkACfcrIfp6cNCAgOg7DW9h3LfKcu6PxTKLtwQLwfJRkRJIHowQzAgcov34DZtQO7k8ijbR6P2b3BTFQglpUXiuzxEZ7J9G2IIgDylLtXB/Fh374ZgX3NdH2dAfIj914nrTGa4uFPtG2IDiT/4gClPI5WQ92cbkYfuX8RvORRBiFBVCu9+ft90fPrjMbQ7QtCGKBXNKWuXpLXC+Fe910mTiYnb1GV68IL0ics/uEDcG6/EcUgGkm/c/Aopiz9+Pc2HGWXd/QxAcDgP34gQfUFcVjiLYHQTwQGoXyTXYOI0EolMrDVQt5EqJtQXAurYoCzKz2LTayW3EsKFBdUSKW3RLPe69GegrxU8tgRkW0LQjWASqQVhwPCvTwiTnIbiudIRQanvht/PUdE3SItgXB2bQqCgC4p64LDe3XnQq5L8TPU8ouC4DggU/NKR18br35aKJtQbAWf092jO0W7g8Ml5MQzGSXYgroYswIhbJz6AvBHvxWFACogXZxMHDY6vnGex/d5WZ1YYAE3Ku4HOtLm8ePQA8PojVgERsMruAFg0jYGKteINqmtgBBWH70xdPzG8eNUldAoVAE/rQpCgBUa+xaYGi//vSre7sWGDiwaigJQkbQBRXNphDtAV7wkeWjbdadCr6fW1SpxKp5M8jrQR4BVuIjQUAwi3ZFAQBh8NxgbrLl3GvvpMziIUsna2/H27COAknlTede36msbhCDkBESBERHgCq7k6tNLeiTHb/i8lpZyDew0op3WItw80XSKpjkoGIJBDPpkCgAMFOBh2iL5xvvhPRC/W3z9B2JLouLT/tuuP2v0BujtOX9Oa2NAQJ/4P712mIxAiYVsI7B3XHkTKIHYFiB/6NjcU2D2PUdVrqc2LMJwdp0WBQAeIjObzAfBbH7WW6P49bOGLqGiJgseAeQUIb8wabZesvYfUESgjjAOzi6YvSkuyEpS1ceDwq0HdXvLFHhJLDB0/+D64/WHKhyDkEQnRIFAB4iuGFhE5Otnq+9b9Fd3PnjBx6w1Fe5joeBLQEx8A9LW3DlycdN+gPlnkJ5HgoXIbAAugMbavZ+ArN02+3+KY7Wg3cxa9Mp2OuDfk9v5KFw1Z9bb2aiIif6iRnXRSBao9OiwABispe3jtcLism2vRyQuPn8gw8u9uM16eKgfB3r2CysTn4Y9sUBYqywAOm4s6klXB/LayAQ4AmfWG1qARU/nv7xu+AF97TpEEUfrCcfMMGBZweuAV8zU4QQiLbosigAMPjDjWyqq+ALN7jvq1SnA9ejTpvoKtwbpd3Hfyz9YeqqQMAucMExWVOCY3Mmhyd8G2+hr3zz0mYLA6LzGAjOB0qxQRySs0p0vZ4kbj7i/fYYlGObDVO8bawl/6irAgFCEJrwzeppdObMiMRcC0UZYaZ6JAhER+iWKDBgiAO8IFEGA7nf68+Ou70iLspKCmZq95MOG6gkGQ0DOjxQwgK8JYwHCwb/wrJqufqGZr7UnBLt5OwSnYzcsgGZ+eXqEKIyH67kjTbHQRABJJ2hJQYM5gGRGXMeh6fPZ9zT6ooSMUP693olJcaf++O+5v/7vobfq2to5C8qr5Utr6oTT80p1YYtM+n3tW5KdrGOrIRg5rTRamdW2upsRhMcBCvyP1YGdweiFwSZAAAAAElFTkSuQmCC" width="413" /> <br /></p><p style="text-align: justify;">We want to get a rresult like [A, F, B, G, D, C, E], or [F, A, B, C, D, E, G], or ...<br /></p><p style="text-align: justify;">It appears that this is a known problem in graph theory: computing the <a href="https://en.wikipedia.org/wiki/Topological_sorting">topological ordering</a> of a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a>. As show in the above example, there are in general several solutions. To try to make the result less "random", we use lexicographical order of field names for the working set S of the Kahn's algorithm, which is an additional sorting rules for nodes that have no direct relationship in the graph.</p><p style="text-align: justify;"></p><p style="text-align: justify;">The result of that work is in <a href="https://github.com/OSGeo/gdal/pull/4552">https://github.com/OSGeo/gdal/pull/4552</a></p><p style="text-align: justify;"><br /></p><p style="text-align: justify;"></p><p style="text-align: justify;"><i></i></p>Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-28968584320517476202021-09-14T08:00:00.003-07:002021-09-14T08:19:31.617-07:00The ultimate guide for successful contribution to open source<div style="text-align: justify;">(This is a post I started on December 2019 and didn't publish because I felt it was too negative. But recent events show that it is still a current topic, and at least this will document my own preception of things) <br /></div><div style="text-align: justify;"> </div><div style="text-align: justify;">We have lately stumbled upon a few <span class="gt-baf-term-text gt-baf-word-selected"><span class="gt-baf-cell gt-baf-word-clickable">clumsy </span></span>attempts of corporations at contributing to open source software. Needless to say this is a bumpy ride not for the faint of heart, from both side of the equation.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If you just want to stop your reading here, just remember that good old motto: When in Rome, do as the Romans do. And be prepared to become a Roman citizen.<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Us, open source developers have our own hard-to-decipher customs. We pay tribute to the Allmighty Git (*), preferably his most revered (often with awe) incarnation called GitHub. You may also find followers of GitLab, gitea, gitorious.</div>
<div style="text-align: justify;">
In that case, never pronounce the word 'GitHub' before them. It is also said that some might still use email patches: be sure to set a 80 character limit to lines, use LF line endings and refrain from using PNG signatures. Be sure to stay away from SourceForge followers. They will continuously impose on you <span class="tlid-translation translation" lang="en"><span title="">advertising messages that will destroy your sanity.</span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
You should subscribe to mailing lists. Be wary of trolls. We may occasionally engage into flame wars, generally sanctionned by motions. We are structured into groups that defer to a Project Steering Committee to decide (or not) which way to follow. Our groups regularly gather at solstice time in dark rooms for a week-long <span class="gt-baf-term-text"><span class="gt-baf-cell gt-baf-word-clickable">trance (note: was written pre-pandemic), known as</span></span> a hackfest, hackathon or code sprint. For the tribes involved in geospatial, we have an annual Pow-Wow, called FOSS4G, during which we temporarily bury the hatchet and occasionaly thank our sponsors. You may stumble upon a C89 developer sharing a beer with a Java 11 one, or an adorer of Leaflet trying to <span class="gt-baf-term-text"><span class="gt-baf-cell gt-baf-word-clickable">convert</span></span> a OpenLayers follower. If you want to join the Pow-Wow, remove your tie and suit, wear a t-shirt and <span class="tlid-translation translation" lang="en"><span title="">sandals</span></span>, and make sure to display a prominent "I
Love Open Source" sticker on your laptop. You may also add a "I Love
Shapefile" or "I Love GeoPackage" sticker, but only if you are confident
to which one the group pays tribute. Failure to display the appropriate sticker will expose you to terrific insults about 10-character limits or obscure write-ahead log locking issues. If unsure, use the "I ? Shapefile" sticker.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We have taboo words. Thou Should Not Talk about business plan, intellectual property, patent portfolio, customer demand, strategy, costless SDK, CVE number, internal policy, consolidated codebases, education license fees. Planning and roadmap should also be pronounced with care, as some tribes will probably not have even the slightest idea of what you talk about. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If despite all those strange habits, you want to join with us, be prepared to follow <span class="tlid-translation translation" lang="en"><span title="">a long and demanding initiation path. You will have to demonstrate your (possibly affected) adoration to our principles. As strange as it can be, we </span></span><span class="tlid-translation translation" lang="en"><span title=""><span class="tlid-translation translation" lang="en"><span title="">abhor being offered large presents that we cannot prevent ourselves from seeing as</span></span></span></span><span class="tlid-translation translation" lang="en"><span title=""> Trojan horses. </span></span><span class="tlid-translation translation" lang="en" tabindex="-1"><span title="">You will rather have to locate a long list of problems pinned on a wall called bug tracker where each member of the community has written an issue or a wish he has. Start by the modest one that makes sense to you, solve it and offer its resolution</span></span><span class="tlid-translation-gender-indicator translation-gender-indicator"></span><span class="tlid-trans-verified-button trans-verified-button" role="button"></span> to the Reviewer in a sufficiently argumented Pull Request. The Reviewer, which often equates to the Maintainer, will scrutinize at your gift, often at night time after Day Job. He may decide to accept it right away, or return it to you with a comment requesting a change, or just ignore it with the highest contempt. Be humble and obey to his command. You may beg for enlightenment, but never object even if you cannot make sense of his rebutal. He is the allmighty after the Allmighty. Never Rebase if asked to Merge; never Merge if asked to Rebase. Refactor when asked, but don't when not ! You should rewrite history before submitting, or maybe not. If unsure, consult Contributing.md, or be prepared that RTFM will be yelled at you (only for the tribes that don't have yet written CodeOfConduct.md). Do not even consider objecting that you have not been tasked to address his demands. You must also make sure to listen to the complaints of the Continuous Integration (CI) half-gods: the Reviewer will probably not even look at your gift until CI has expressed its satisfaction. Retry as many times as needed until they are pleased. You may attempt at submitting a RFC, but be prepared for lengthy and
lively discussions. Listen, correct or you may not survive to your first
"-1" spell !</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We praise especially gifts that have no direct value for you: improved test suite (e.g. <a href="https://github.com/OSGeo/gdal/issues/4407">https://github.com/OSGeo/gdal/issues/4407</a>), documentation addition and fixes, answering to users on the mailing list. Only when you feel that you have built enough trust, you might try to offer your initial gift. But, even if it is accepted, the most surprising habit is that your gift will remain yours. You will have to make sure you regularly remove the dust that accumlates on it over time, so it remains constantly shiny. While removing dust on your gift, never neglect from removing it also from nearby gifts offered by others. Otherwise the Maintainer might threaten at invoking the terrible Revert incantation on you. Also consider that existing contributors to a project might see your new code, they won't have directly use of, as an extra burden to their daily tasks (studying the commit history of <a href="https://github.com/qgis/QGIS/commits/master/src/providers/hana">https://github.com/qgis/QGIS/commits/master/src/providers/hana</a>, or even more crudly at <a href="https://github.com/qgis/QGIS/commits/master/src/providers/db2">https://github.com/qgis/QGIS/commits/master/src/providers/db2</a>, demonstrates that)<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Ready for a ride, and possibly enjoying the feeling that "our" code can also become "yours" ?</div>
<br />
<br />
<div style="text-align: justify;">
(*) some tribes, cut off from the rest of the world, are said to still pursue their
adoration of more ancient divinities sometimes known as SVN
or CVS. We cannot confirm and have eradicated the last remains of those old cults.</div>
<div style="text-align: justify;">
<span class="tlid-translation translation" lang="en"><span title=""><span class="tlid-translation translation" lang="en"><span title=""></span></span></span></span></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-20875062559266896252021-08-31T07:39:00.002-07:002021-08-31T07:39:21.950-07:00Analyze of OSS-Fuzz related data on GDAL<div><p style="text-align: justify;"><a href="https://gdal.org">GDAL</a> has now been put under the continuous scrutinity of <a href="https://github.com/google/oss-fuzz">OSS-Fuzz</a> for more than 4 years. To keep it simple, OSS-Fuzz is a continuous running infrastructure that stresses a software with (not-so-)random data to discover various flaws, and automatically files issues in a dedicated issue tracker, with reproducer test cases and stack traces when available. It is time to use a bit the data accumulated to give a few trends.</p><p style="text-align: justify;">First, we can see a total of <a href="https://bugs.chromium.org/p/oss-fuzz/issues/list?q=gdal&can=1">1787 issues</a> having been found, which represents on average a bit more than one per day. Out of those, only <a href="https://bugs.chromium.org/p/oss-fuzz/issues/list?q=gdal&can=2">38</a> are still open (so 97.8% have been fixed or are no longer reproducible). Those 1787 issues are out of a total of 37 769 filed issues against all 530 <a href="https://github.com/google/oss-fuzz/tree/master/projects">enrolled software</a>, hence representing 4.6 % (so significantly higher than the naive 1 / 530 = 0.2 % proportion that we could expect, at least if all software were of the same size, but GDAL is likely larger than the average). Does that mean that the quality of GDAL is lower than the average of enrolled software, or that it is stressed in a more efficient way... ? Most of GDAL code being about dealing with a lot of file formats, it is the ideal fit for fuzz testing.<br /></p><p style="text-align: justify;">We should mention that a number of issues attributed to GDAL actually belong to some of its dependencies: PROJ (coordinate transformation), Poppler (PDF rendering), cURL (network access), SQLite3 (embedded database), Xerces-C (XML parsing), etc. And we reguarly report or fix those issues to those upstream components.</p><p style="text-align: justify;">Addressing those issues is now facilitated with the <a href="https://gdal.org/sponsors/">sponsorship program</a> which allows us to spend funded time on such unsexy and usually hard to fund topics, but important to enhance the robustness of the software. <br /></p><p style="text-align: justify;">We have run a <a href="https://github.com/OSGeo/gdal/blob/master/gdal/scripts/analyze_ossfuzz.py">script</a> that parses git commit logs to identify commits that explicitly reference an issue filed by ossfuzz and tried to analyze the commit message to find the category of the issue that it addresses.</p><p style="text-align: justify;">Here's its output: <br /></p><p style="text-align: justify;"><span style="font-family: courier;">Category Count Pct<br />----------------------------------------------------------------------<br />integer_overflow_signed 178 14.67 %<br />excessive_memory_use 174 14.34 %<br />other_issue 114 9.40 %<br />buffer_overflow_unspecified 105 8.66 %<br />excessive_processing_time 102 8.41 %<br />null_pointer_dereference 93 7.67 %<br />integer_overflow_unsigned 68 5.61 %<br />division_by_zero_integer 54 4.45 %<br />heap_buffer_overflow_read 37 3.05 %<br />unspecified_crash 32 2.64 %<br />stack_call_overflow 31 2.56 %<br />memory_leak_error_code_path 23 1.90 %<br />heap_buffer_overflow_write 22 1.81 %<br />division_by_zero_floating_point_unknown_consequence 19 1.57 %<br />stack_buffer_overflow_unspecified 19 1.57 %<br />invalid_cast 14 1.15 %<br />invalid_shift_unspecified_dir 14 1.15 %<br />memory_leak_unspecified 13 1.07 %<br />assertion 13 1.07 %<br />invalid_enum 11 0.91 %<br />division_by_zero_floating_point_harmless 10 0.82 %<br />infinite_loop 10 0.82 %<br />integer_overflow_unsigned_harmless 8 0.66 %<br />invalid_memory_dereference 7 0.58 %<br />undefined_shift_left 7 0.58 %<br />double_free 6 0.49 %<br />stack_buffer_overflow_read 6 0.49 %<br />use_after_free 6 0.49 %<br />integer_overflow_harmless 5 0.41 %<br />undefined_behavior_unspecified 3 0.25 %<br />negative_size_allocation 3 0.25 %<br />unhandled_exception 2 0.16 %<br />invalid_shift_right 2 0.16 %<br />unsigned_integer_underflow 1 0.08 %<br />uninitialized_variable 1 0.08 %<br />----------------------------------------------------------------------<br />Total 1213<br /></span><span style="font-family: courier;"><br /></span><br />So 1213 commits for 1787-38 fixed issues: the difference is duplicated issues, non-reliably reproducing issues that end up being closed or issues fixed in other code bases than GDAL.<br /></p><p style="text-align: justify;">Let's dig into those categories, from most frequently hit to lesser ones:</p></div><ul style="text-align: justify;"><li>integer_overflow_signed: an arithmetic operation on a signed integer whose results overflows its size. This is a undefined behavior in C/C++, meaning that anything can happen in theory. In practice, most reasonable compilers and common CPU architectures implementing complement-to-two signed integers will have a wrap around behavior, and not crash when the overflow occurs. However this has often later consequences, like allocating an array of the wrong size, and having out-of-bounds access.<br /> <br /></li><li>excessive_memory_use: this is not a vulnerability by itself. This issue is raised because processes that run under OSSFuzz are limited to 2 GB of RAM usage, which is reasonable given than OSSFuzz manipulates input buffers that are generally large of a few tens of kilobytes. It is thus expected that for those small inputs, RAM consumption should remain small. When that's violated, it is because the code generally trusts too much some fields in the input data that drive a memory allocation, without checking that against reasonable bounds, or the file size. However for some file formats, it is difficult to implement definitive bounds because they allow valid small files that need a lot of RAM to be processed. Part of the remaining open issues belong to that category.<br /><br /></li><li>other_issue: issues that could not be classified under a more precise category. Bonus points for anyone improving the script to analyze the diff and
figure from the code the cases where the commit message lacks details! Or perhaps just parse the OSSFuzz issue itself which gives a categorization.<br /><br /></li><li>buffer_overflow_unspecified: an access outside the validity area of some buffer (string, array, vector, etc.), and we couldn't determine if it is a heap allocated or stack allocated, or a read attempt or write attempt. Potentially can result in arbitrary code execution.<br /><br /></li><li>excessive_processing_time: this is when a program exceeds the timeout of 60 second granted by OSS-Fuzz to complete processing of an input buffer. This is often due to a sub-optimal algorithm (e.g. quadratic performance whereas linear can be met), or a unexpected network access. Most of the remaining open issues are in that category. A significant numbers are also in a out-of-memory situation hit by the fuzzer while iterating over many inputs, but often that cannot be reproduced on a individual test case. We suspect heap fragmentation to happen in some of those situations.<br /><br /></li><li>null_pointer_dereference: a classic programming issue: accessing a null pointer, which results in a immediate crash.<br /><br /></li><li>integer_overflow_unsigned: this one is interesting. Technically in C/C++, overflow of unsigned integers is a well-defined behavior. Wrap around behavior is guaranteed by the standards. However we assumed that in most cases, the overflow was unintended and could lead to similar bugs as signed integer overflow, hence we opted in for OSSFuzz to consider those overflows as issues. For the very uncommon cases where the overflow is valid (e.g when applying a difference filter on a sequence of bytes), we can tag the function where it occurs with a <span style="font-family: courier;">__attribute__((no_sanitize("unsigned-integer-overflow")))</span> annotation<br /><br /></li><li>division_by_zero_integer: an integer divided by zero, with zero being evaluated as an integer. Results in immediate crash on at least x86 architecture<br /><br /></li><li>heap_buffer_overflow_read: read access outside the validity area of a heap allocated data structure. Results generally in a crash.<br /><br /></li><li>unspecified_crash: crash for a unidentified reason. Same bonus point as above to help better categorizing them.<br /><br /></li><li>stack_call_overflow: recursive calls to methods that ends up blowing the size limit of the stack, which results in a crash.<br /><br /></li><li>memory_leak_error_code_path: memory leak that is observed in a non-nominal code path, that is on corrupted/hostile datasets.<br /><br /></li><li>heap_buffer_overflow_write: write access outside the validity area of a heap allocated data structure. Results generally in a crash. Could also be sometimes exploited for arbitrary code execution.<br /><br /></li><li>division_by_zero_floating_point_unknown_consequence: a variable is divided by zero, and this operation is done with floating point evaluation. In C/C++, this is undefined behavior, but on CPU architectures implementing IEEE-754 (that is pretty much all of them nowadays) with default settings, the result is either infinity or not-a-number. If that result is cast to an integer, this results in undefined behavior again (generally not crashing), and various potential bugs (which can be crashing)<br /><br /></li><li>stack_buffer_overflow: a read or write access outside of a stack-allocated buffer. Often results in crashes, and if a write access, can be sometimes exploited for arbitrary code execution.<br /><br /></li><li>invalid_cast: this may be an integer cast to an invalid value for an enumeration (unspecified behavior, which can results in bugs due to not catching that situation later), an instance of a C++ class cast to an invalid type (unspecified behavior, crash likely)<br /><br /></li><li>invalid_shift_unspecified_direction: a left- or right-shift binary operation, generally on a signed integer. For left-shift, this is when this results in setting the most-significant-bit being set (either shifting too much a positive value, which results to a negative value, or shifting a negative value), or shifting by a number of bits equal or greater than the width of the integer. For rigth-shift, this is when shiting by a number of bits equal or greater than the width of the integer. Those undefined behaviors do not result in immediate crashes on common compilers/platforms, but can lead to subsequent bugs.<br /> <br /></li><li>memory_leak_unspecified: self explanatory.<br /><br /></li><li>assertion: an assert() in the code is hit. A lot of what we initally think as programming invariants can actually be violated by specially crafted input, and should be replaced by classic checks to error out in a clean way.<br /><br /></li><li>invalid_enum: a particular case of invalid_cast where an invalid value is stored in a variable of a enumeration type.<br /> <br /></li><li>division_by_zero_floating_point_harmless: a floating-point division by zero, whose consequences are estimated to be harmless. For example the NaN or infinity value is directly returned to the user and does not influence further execution of the code.<br /><br /></li><li>infinite_loop: a portion of the code is executed in a endless way. This is a denial of service.<br /><br /></li><li>integer_overflow_unsigned_harmless: an example of that could be some_string.substr(some_string.find(' ') + 1) to extract the part of a string after the first space character, or the whole string if there's none. find() will return std::string::npos which is the largest positive size_t, and thus adding 1 to it will result in 0.<br /><br /></li><li>invalid_memory_dereference: generally the same as a heap_buffer_overflow_read.<br /> <br /></li><li>invalid_shift_left: mentioned above.<br /><br /></li><li>double_free: a heap-allocated buffer is destroyed twice. Later crash is likely to occur. Could potentially be exploited for arbitrary code execution.<br /><br /></li><li>stack_buffer_overflow_read: mentioned above<br /><br /></li><li>use_after_free: subcase of heap_buffer_overflow_read where one accesses some buffer after it has been freed.<br /><br /></li><li>integer_overflow_harmless: mentioned above, but here, if we exclude the undefined behavior aspect of it, consequences are estimated to be harmless.<br /><br /></li><li>undefined_behavior_unspecified: some undefined behavior of a non identified category, restricted to those caught by the UBSAN analyzer of clang<br /><br /></li><li>negative_size_allocation: a negative size (actually a unsigned integer with its most significant bit set) is passed to a heap memory allocation routine. Not a bug by itself, but often reflects a previous issue.<br /><br /></li><li>unhandled_exception: a C++ exception that propagates up to the main() without being caught. Crash ensured for C code, or C++ callers not expecting it. As the fuzzer programs used by GDAL use the C API, such exception popping up is definitely a bug.<br /><br /></li><li>invalid_shift_right: mentioned above.<br /><br /></li><li>unsigned_integer_underflow: similar as the overflow but going through the negative values. Well defined behavior according to the standards, but often undesirable.<br /><br /></li><li>uninitialized_variable: access to a variable whose content is uninitialized. There are very few instances of that. The reason is probably that we use extensively strict compiler warnings and static code analyzers (cppcheck, clang static analyzer and Coverity Scan), that are very good to catch such issues.</li></ul><p style="text-align: justify;">Despite the big number of categories caught by the sanitizers run by OSS-Fuzz, there are some gaps not covered. For example casting an integer (or floating point value) to a narrower integer type with a value that does not fit into the target type. Those are considered as "implementation defined" (the compiler must do something, potentially different from another one, and document what it does), and thus do not enter into what is caught by undefined behavior sanitizers.<br /></p><div><p style="text-align: justify;"></p><p style="text-align: justify;">PS: For those interested in academic research analyzing outputs of OSSFuzz, you can find <a href="https://arxiv.org/pdf/2103.11518.pdf">this paper</a> (where GDAL actually appears in one figure)<br /></p></div>Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-48239364886924082412019-05-02T05:52:00.003-07:002019-05-02T05:52:48.191-07:00Incremental Docker builds using ccache<div style="text-align: justify;">
Those past days, I've experimented with Docker to be able to produce "official" <a href="https://hub.docker.com/r/osgeo/gda">nightly builds of GDAL</a></div>
<div style="text-align: justify;">
Docker Hub is supposed to have an automated build mechanism, but the cloud resources they put behind that feature seem to be insufficient to sustain the demand and builds tend to drag forever.</div>
<div style="text-align: justify;">
Hence I decided to setup a local cron job to refresh my images and push them. Of course, as there are currently <a href="https://github.com/OSGeo/gdal/tree/master/gdal/docker">5 different Dockerfile configurations</a> and building both PROJ and GDAL from scratch could be time consuming, I wanted this to be as most efficient as possible. One observation is that between two nightly builds, very few files changes on average, so ideally I would want to recompile only the ones that have changed, and have the minimum of updated Docker layers refreshed and pushed.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There are several approaches I combined together to optimize the builds. For those already familiar with Docker, you can probably skip to the "Use of ccache" section of this post.</div>
<br />
<h3>
Multi-stage builds</h3>
<div style="text-align: justify;">
This is a Docker 17.05 feature in which you can define several steps (that will form each a separate image), and later steps can copy from the file system of the previous steps. Typically you use a two-stage approach.</div>
<div style="text-align: justify;">
The first stage installs development packages, builds the application and installs it in some /build directory.</div>
<div style="text-align: justify;">
The second stage starts from a minimal image, installs runtime dependency, and copies the binaries generated at the previous stage from the /build to the root of the final image.</div>
<div style="text-align: justify;">
This approach avoids any development packages to be in the final image, which keeps it lean.</div>
<br />
Such Dockerfile will look like:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">FROM ubuntu:18.04 AS builder<br />RUN apt-get install g++ make<br />RUN ./configure --prefix=/usr && make && make install DESTDIR=/build<br /><br />FROM ubuntu:18.04 AS finalimage<br />RUN apt-get install libstdc+<br />COPY --from=builder /build/usr/ /usr/</span><br />
<br />
<br />
<h3>
Fine-grained layering of the final image</h3>
<div style="text-align: justify;">
Each step in a Dockerfile generates a layer, which chained together form an image.</div>
<div style="text-align: justify;">
When pulling/pushing an image, layers are processed individually, and only the ones that are not present on the target system are pulled/pushed.</div>
<div style="text-align: justify;">
One important note is that the refresh/invalidation of a step/layer causes the</div>
<div style="text-align: justify;">
refresh/invalidation of later steps/layers (even if the content of the layer does</div>
<div style="text-align: justify;">
not change in a user observable way, its internal ID will change).</div>
<div style="text-align: justify;">
So one approach is to put first in the Dockerfile the steps that will change the less frequently, such as dependencies coming from the package manager, third-party dependencies whose versions rarely change, etc. And at the end, the applicative part. And even the applications refreshed as part of the nightly builds can be decomposed in fine-grained layers.</div>
<div style="text-align: justify;">
In the case of GDAL and PROJ, the installed directories are:</div>
<span style="font-family: "Courier New", Courier, monospace;">$prefix/usr/include</span><br />
<span style="font-family: "Courier New", Courier, monospace;">$prefix/usr/lib</span><br />
<span style="font-family: "Courier New", Courier, monospace;">$prefix/usr/bin</span><br />
<span style="font-family: "Courier New", Courier, monospace;">$prefix/usr/share/{proj|gdal}</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The lib is the most varying one (each time a .cpp file changes, the .so changes).</div>
<div style="text-align: justify;">
But installed include files and resources tend to be less frequently updated.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So a better ordering of our Dockerfile is:</div>
<span style="font-family: "courier new" , "courier" , monospace;">COPY --from=builder /build/usr/share/gdal/ /usr/share/gdal/<br />COPY --from=builder /build/usr/include/ /usr/include/<br />COPY --from=builder /build/usr/bin/ /usr/bin/<br />COPY --from=builder /build/usr/lib/ /usr/lib/</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
With one subtlety, as part of our nightly builds, the sha1sum of the HEAD of the git repository is embedded in a string of $prefix/usr/include/gdal_version.h. So in the builder stage, I separate that precise file from other include files and put it in a dedicated /build_most_varying target together with the .so files.</div>
<span style="font-family: "courier new" , "courier" , monospace;"><br />RUN [..] \<br /> && make install DESTDIR=/build \<br /> && mkdir -p /build_most_varying/usr/include \<br /> && mv /build/usr/include/gdal_version.h /build_most_varying/usr/include \<br /> && mv /build/usr/lib /build_most_varying/usr</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
And thus, the finalimage stage is slightly changed to:</div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">COPY --from=builder /build/usr/share/gdal/ /usr/share/gdal/<br />COPY --from=builder /build/usr/include/ /usr/include/<br />COPY --from=builder /build/usr/bin/ /usr/bin/<br />COPY --from=builder /build_most_varying/usr/ /usr/</span><br />
<br />
<h3>
Layer depending on a git commit</h3>
<div style="text-align: justify;">
In the builder stage, the step that refreshes the GDAL build depends on an</div>
<div style="text-align: justify;">
argument, GDAL_VERSION, that defaults to "master"</div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">ARG GDAL_VERSION=master<br />RUN wget -q https://github.com/OSGeo/gdal/archive/${GDAL_VERSION}.tar.gz \<br /> && build instructions here...</span><br />
<div style="text-align: justify;">
Due to how Docker layer caching works, building several times in a row this Dockerfile would not refresh the GDAL build (unless you invoke docker build with the --no-cache switch, which disable all layer caching), so the script that triggers the docker build, gets the sha1sum of the latest git commit and passes it with: </div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">GDAL_VERSION=$(curl -Ls https://api.github.com/repos/OSGeo/gdal/commits/HEAD -H "Accept: application/vnd.github.VERSION.sha")<br />docker build --build-var GDAL_VERSION=${GDAL_VERSION} -t myimage .</span><br />
<div style="text-align: justify;">
In the (unlikely) event where the GDAL repository would not have changed, no</div>
<div style="text-align: justify;">
new build would be even attempted.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Note: this part is not necessarily a best practice. Other Docker mechanisms,</div>
<div style="text-align: justify;">
such as using a Git URL as the build context, could potentially be used. But as</div>
<div style="text-align: justify;">
we want to be able to refresh both GDAL and PROJ, that would not be really suitable.</div>
<div style="text-align: justify;">
Another advantage of the above approach is that the Dockerfile is self sufficient</div>
<div style="text-align: justify;">
to create an image with just "docker build -t myimage ."</div>
<br />
<h3>
Use of ccache</h3>
<div style="text-align: justify;">
This is the part for which I could not find an already made & easy to deploy solution.</div>
<br />
<div style="text-align: justify;">
With the previous techniques, we have a black and white situation. A GDAL build is either entirely cached by the Docker layer caching in the case the repository did not change at all, or completely done from scratch if the commit id has changed (which may be some change not affecting at all the installed file). It would be better if we could use <a href="https://github.com/ccache/ccache">ccache</a> to minimize the number of files to be rebuilt. </div>
<div style="text-align: justify;">
Unfortunately it is not possible with docker build to mount a volume where the ccache directory would be stored (apparently because of security issues). There is an <a href="https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md">experimental RUN --mount=type=cache feature</a> in Docker 18.06 that could perhaps be equivalently used, but it requires both the client and daemon to be started in experimental mode.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The trick I use, which has the benefit of working with a default Docker installation, is to download from the Docker build container the content of a ccache directory from the host, do the build and upload the modified ccache back onto the host.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I use rsync for that, as it is simple to setup. Initially, I used a rsync daemon directly running in the host, but based on inspiration given by <a href="https://github.com/WebHare/ccache-memcached-server">https://github.com/WebHare/ccache-memcached-server</a> which proposes an alternative, I've modified it to run in a Docker container, gdal_rsync_daemon, which mounts the host ccache directory. The benefit of my approach over the ccache-memcached-server one is that it does not require a patched version of ccache to run in the build instance.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So the synopsis is:</div>
<br />
host cache directory <--> gdal_rsync_daemon (docker instance) <------> Docker build instance<br />
(docker volume mounting) (rsync network protocol)<br />
<br />
<br />
<div style="text-align: justify;">
You can consult <a href="https://github.com/OSGeo/gdal/blob/26342f736f38fd5c7ba8b63fcfe9d29648d67035/gdal/docker/util.sh#L223:L269">here</a> the relevant portion of the launching script which builds and launches the gdal_rsync_daemon. And the corresponding Dockerfile step in the builder stage is rather straightforward:</div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /># for alpine. or equivalent with other package managers<br />RUN apk add --nocache rsync ccache<br /><br />ARG RSYNC_REMOTE<br />ARG GDAL_VERSION=master<br />RUN if test "${RSYNC_REMOTE}" != ""; then \<br /> echo "Downloading cache..."; \<br /> rsync -ra ${RSYNC_REMOTE}/gdal/ $HOME/; \<br /> export CC="ccache gcc"; \<br /> export CXX="ccache g++"; \<br /> ccache -M 1G; \<br /> fi \<br /> # omitted: download source tree depending on GDAL_VERSION<br /> # omitted: build<br /> && if test "${RSYNC_REMOTE}" != ""; then \<br /> ccache -s; \<br /> echo "Uploading cache..."; \<br /> rsync -ra --delete $HOME/.ccache ${RSYNC_REMOTE}/gdal/; \<br /> rm -rf $HOME/.ccache; \<br /> fi</span><br />
<br />
<div style="text-align: justify;">
I also considered a simplified variation of the above that would not use rsync, where after the build, we would "docker cp" the cache from the build image to the host, and at the next build, copy the cache into the build context. But that would have two drawbacks:</div>
<ul style="text-align: justify;">
<li style="text-align: justify;">our build layers would contain the cache</li>
<li style="text-align: justify;">any chance in the cache would cause the build context to be different and subsequent builds to have their cached layers invalidated.</li>
</ul>
<br />
<h3>
Summary</h3>
<div style="text-align: justify;">
We have managed to create a <a href="https://github.com/OSGeo/gdal/blob/master/gdal/docker/alpine-normal/Dockerfile">Dockerfile</a> that can be used in a standalone mode</div>
<div style="text-align: justify;">
to create a GDAL build from scratch, or can be integrated in a wrapper <a href="https://github.com/OSGeo/gdal/blob/master/gdal/docker/util.sh">build.sh</a></div>
<div style="text-align: justify;">
script that offers incremental rebuild capabilities to minimize use of CPU resources. The image has fine grained layering which also minimizes upload and download times for frequent push / pull operations.</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-38015468853054631532019-01-31T12:04:00.000-08:002019-01-31T12:04:03.732-08:00SRS barn raising: 8th report. Ready for your testing !<div style="text-align: justify;">
This is the 8th progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As the title implies, a decisive milestone has now been reached, with the "gdalbarn" branches of libgeotiff and GDAL having been now merged in their respective master branch.</div>
<div style="text-align: justify;">
<br />
</div>
<div style="text-align: justify;">
On the PROJ side, a number of fixes and enhancements have been done:</div>
<div style="text-align: justify;">
- missing documentation for a few functions, the evolution of cs2cs and the new projinfo utility has been added</div>
<div style="text-align: justify;">
- the parser of the WKT CONCATENATEDOPERATION construct can now understand step presented in a reverse order</div>
<div style="text-align: justify;">
- a few iterations to update the syntax parsing rules of WKT2:2018 following the latest adjustments done by the OGC WKT Standard Working Group</div>
<div style="text-align: justify;">
- in my previous work, I had introduced a "PROJ 5" to export CRS using pipeline/unitconvert/axisswap as an attempt of improving the PROJ.4 format used by GDAL and other products. However after discussion with other PROJ developers, we realize that it is likely a dead-end since it is still lossy in many aspects and can cause confusion with coodinate operations. Consequently the PROJ_5 convention will be identical to PROJ_4 for CRS export. And the use of PROJ strings to express CRS themselves is discouraged. It can still makes sense if using the "early-binding" approach and specifying towgs84/nadgrids/geoidgrids parameters. But in a late-binding approach, WKT is much more powerful to capture important information like geodetic datum names.</div>
<div style="text-align: justify;">
- when examining how the new code I added those past months with the existing PROJ codebase, it became clear that there was a confusion when importing PROJ strings expressing coordinate operations versus PROJ strings expressing a CRS. So for the later use case, a "+type=crs" must be added in the PROJ string. As a consequence the proj_create_from_proj_string() and proj_create_from_user_input() functions have been removed, and proj_create() can now been used for all types of PROJ strings.</div>
<div style="text-align: justify;">
- The PROJ_LIB environment variable now supports multiple paths, separated by colon on Unix and semi-colon on Windows</div>
<div style="text-align: justify;">
<br />
</div>
<div style="text-align: justify;">
On the GDAL side,</div>
<ul style="text-align: justify;">
<li>the OGRCoordinateTransformation now uses the PROJ API to automatically compute
the best transformation pipeline, enabling late-binding capabilities. In the case where the user does not provide an explicit area of use and several coordinate operations are possible, the Transform() method can automatically switches between coordinate operations given the input coordinate values. This should offer behaviour similar to previous versions for example for NAD27 to NAD83 conversion when PROJ had a +nadgrids=@conus,@alaska,@ntv2_0.gsb,@ntv1_can.dat hardcoded rule. This dymanic selection logic has also been moved to PROJ proj_create_crs_to_crs() function. Note that however this might not always lead to the desired results, so specifying a precise area of interest, or even a specific coordinate operation, is preferred when full control is needed.</li>
<li>gdalinfo, ogrinfo and gdalsrsinfo now outputs WKT2:2018 by default (can be changed with a command line switch). On the API side, the exportToWKT() method will still export WKT1 by default (can of course be changed). The rationale is that WKT consumers might not be ready yet for WKT2, so this should limit the backward compatibility issues. In the future (couple of years timeframe), this default WKT version might be upgraded when more consumers are WKT2-ready.</li>
<li>A <a href="https://trac.osgeo.org/gdal/wiki/rfc73_proj6_wkt2_srsbarn">RFC 73: Integration of PROJ6 for WKT2, late binding capabilities, time-support and unified CRS database</a> document was created to document all the GDAL changes. After discussion with the community, this RFC has been approved</li>
<li>As a result, all of the above mentionned work has now been merged in GDAL master</li>
<li>Important practical discussion: GDAL master now depends on PROJ master (and ultimately PROJ 6.0 once it is released)</li>
</ul>
<div style="text-align: justify;">
Consequently, on the pure development front, most of the work has now been completed. As all those changes done those last months deeply impact SRS related functionnality in GDAL and PROJ, we rely now on your careful testing to spot the inevitable issues that have not yet been detected by their respective automatic regression test suites. The earlier they are detected, the easier they will be fixable, in particular if they impact the API.</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-25952081458906091462018-12-28T05:12:00.000-08:002018-12-28T05:12:08.318-08:00SRS barn raising: 7th reportThis is the 7th progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort. <br />
<br />
<div style="text-align: justify;">
On the PROJ side, things are consolidating up. Tens of "random" fixes have been pushed due to the GDAL autotest suite triggering a number of interesting cases. The C API has also been enhanced to accommodate for the needs of GDAL and libgeotiff. We also have received feedback from an early adopter (a developer of a <a href="https://github.com/snowman2/pyproj/tree/gdalbarn">pyproj binding</a> based on PROJ master). The major development items have been the move of the WKT 1 syntax validation from GDAL to PROJ, as well as the development of an equivalent WKT 2 syntax validator in PROJ (this task has been useful to uncover a few minor issues in the draft of the future WKT2:2018 standard). A <a href="https://github.com/OSGeo/proj.4/pull/1203">reorganization of the PROJ source tree</a> (with a conversion of most C files to C++ files) has also been done, as a preliminary step, for a <a href="https://github.com/OSGeo/proj.4/pull/1208">pull request</a> to better integrate the new ISO-19111 functionality I developed those last months with the existing C API.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Regarding libgeotiff, a v1.4.3 maintenance version has been released with the fixes of the last two years, before the new works for the integration of PROJ master are done. libgeotiff development has been moved from Subversion to <a href="https://github.com/OSGeo/libgeotiff">https://github.com/OSGeo/libgeotiff</a>. As a preliminary step, continuous integration capability has been added to test compilation under Linux/GCC and Windows/Visual Studo, with a few runtime tests.</div>
<div style="text-align: justify;">
A <a href="https://github.com/OSGeo/libgeotiff/pull/2">pull request</a> is ready with the integration of PROJ master with libgeotiff. It features:</div>
<ul>
<li style="text-align: justify;">PROJ master / PROJ 6 as a required depedency of libgeotiff</li>
<li style="text-align: justify;">Use of the proj.db database to resolve the various coded values, mostly in the GTIFGetDefn() "normalization" function of libgeotiff, instead of using the .csv files previously generated from the EPSG database. Typically an EPSG code identifying a projected CRS is resolved them into its base elements: projection method, projection parameters, base geodetic CRS, etc..</li>
<li style="text-align: justify;">Complete removal of those .csv files and associated functionality</li>
</ul>
<div style="text-align: justify;">
This work will be merged once the above mention PROJ pull request that affects naming of the new functions of the C API has been merged and taken into account. This gdalbarn branch of libgeotiff has also been succesfully integrated in the internal libgeotiff copy of the GDAL gdalbarn branch.</div>
<br />
<div style="text-align: justify;">
Regarding GDAL, a maintenance v2.3.3 and feature v2.4.0 versions have been released for the same reasons as above.</div>
<div style="text-align: justify;">
Most methods of the OGRSpatialReference class have now been re-implemented to rely on the PROJ C API to do queries and state changes, which avoids potentially lossy import / export to WKT 1. Similarly to the libgeotiff work, the ImportFromEPSG() functionality now relies on proj.db, and consequently all EPSG or ESRI related .csv files have been removed from the GDAL data directory. I've also <a href="https://lists.osgeo.org/pipermail/gdal-dev/2018-December/049501.html">drafted a plan</a> regarding on how to be able to take into account WKT 2 by GDAL raster drivers, and proposed changes regarding how to better handle the gap between the axis order as mandated by the CRS authority (for example latitude first, longitude second for geographic CRS in the EPSG dataset) and the actual order of the values in raster metadata or vector geometries. The first part (use of OGRSpatialReference in raster driver) has been implemented in the gdalbarn branch and the second part is in good progress, with the drivers now advertizing their data axis to CRS axis mapping. The ongoing work is to make OGRCoordinateTransformation use the PROJ API to automatically compute the best transformation pipeline, enabling late-binding capabilities.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-66037732021405278892018-11-30T06:05:00.000-08:002018-11-30T06:05:27.821-08:00SRS barn raising: 6th report<div style="text-align: justify;">
This is the sixth progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort. The pace of changes has not yet slow down, with still a significant part of the work being in PROJ, and an initial integration in GDAL.</div>
<br />
<div style="text-align: justify;">
The major news item is that <a href="https://proj4.org/community/rfc/rfc-2.html">RFC2</a>, implementing the new capabilities (WKT-2 support, late-binding approach, SQLite database), has now been merged into PROJ master.</div>
<br />
<div style="text-align: justify;">
An initial integration of PROJ master into GDAL has been started in a <a href="https://github.com/rouault/gdal/tree/gdalbarn">custom GDAL branch</a> . This includes: </div>
<ul style="text-align: justify;">
<li>PROJ master, which will be released as 6.0, is now a required dependency of GDAL. It actually becomes the only required external third-party dependency (if we except the copies of a few libraries, such as libtiff, libgeotiff, etc. that have been traditionaly included into the GDAL source tree)</li>
<li>The dozen of continuous integration configurations have been modified to build PROJ master as a preliminary step.</li>
<li>Related to the above, we have including into PROJ a way to "version" its symbols. If PROJ is built with -DPROJ_RENAME_SYMBOLS in CFLAGS and CXXFLAGS, all its exported symbols are prefixed with "internal_". This enables GDAL to link against PROJ master, while still using pre-compiled dependencies (such as libspatialite) that link against the system PROJ version, without a risk of symbol clash. This is particularly useful to be able to run GDAL autotests on continuous integration environments that use pre-packaged dependencies (or if you want to test the new GDAL without rebuilding all reverse dependencies of GDAL). This however remains a hack, and ultimately when PROJ 6 has been released, all reverse dependencies should be built against it. (this solution has been successfully tested in the past where GDAL had a libtiff 4.0 internal copy, whereas external libtiff used by some GDAL dependencies relied on the system libtiff 3.X)</li>
<li>Compatibility mechanisms which were required to support older PROJ versions have been removed. In particular, the runtime loading (using dlopen() / LoadLibrary() mechanism) has been removed. It proved to cause code complication, and users frequently ran into headaches with different PROJ versions being loaded and clashing/crashing at runtime.</li>
<li>The OGRSpatialReference class which implements CRS manipulation in GDAL has been modified to use the new PROJ functions to import and export between WKT and PROJ strings. Previously GDAL had such code, which is now redundant with what PROJ offers. This preliminary integration caused a number of fixes to be made on PROJ to have compatibility with the input and output of GDAL for WKT.1 and PROJ strings. Besides "moving" code from GDAL to PROJ, a practical consequence is that the addition of a new projection method into PROJ will no longer require changes to be made to GDAL for it to be usable for reprojection purposes.</li>
</ul>
<br />
There have been reflections on how to use the new code developped in PROJ by the existing PROJ code. A <a href="https://github.com/OSGeo/proj.4/pull/1182">pull request</a> is currently under review and implements:<br />
<ul>
<li style="text-align: justify;">changes needed to remove from the data/ directory the now obsolete EPSG, IGNF, esri and esri.extra files to rely instead of the proj.db dataase </li>
<li style="text-align: justify;">making the proj_create_crs_to_crs() API use the new late-binding approach to create transformation pipelines</li>
<li style="text-align: justify;">updating cs2cs to use that new API. </li>
<li style="text-align: justify;">list and address backward compatibility issues related to honouring official axis order</li>
</ul>
<div style="text-align: justify;">
Integration phase in GDAL continus with the aim of using more of the new PROJ code. Typically the OGRSpatialReference class that models in GDAL the CRS/SRS was up to now mostly a hierarchy of WKT nodes, where setters methods of OGRSpatialReference would directly create/modify/delete nodes, and getter methods query them. This approach was fine when you had to manage just one WKT version (with the caveat that it was also easy to produce invalid WKT representions, lacking mandatory nodes). However, this is no longer appropriate now that we want to support multiple WKT versions. Our goal is to make OGRSpatialReference act rather on osgeo::proj::CRS objects (and its derived classes). Switching between the two abstractions is a non-trivial task and doing it in a bing-bang approach seemed risky, so we are progressively doing it by using a dual internal modelling. A OGRSpatialReference instance will maintain as a primary source a osgeo::proj::CRS object, and for operations not yet converted to the new approach, will fallback to translating it internally to WKT.1 to allow direct manipulation of the nodes, and then retranslate that updated WKT.1 representation back to a osgeo::proj::CRS object. Ultimately the proportion of methods using the fallback way should decrease (it is not completely clear we can remove all of them since direct node manipulation is spread in a significant number of GDAL drivers). The task is slowly progressing, because each change can subtely modify the final WKT.1 representation (nodes being added, number of significant digits changing) and cause a number of unit tests to break (GDAL autotest suite is made of 280 000 lines of Python code) and be analyzed to see if there was a bug and or just an expected result to be slightly altered.</div>
<div style="text-align: justify;">
Because of all the above impacts, we have decided to do an early release in December of GDAL master as GDAL 2.4.0 with all the new features since GDAL 2.3, in order to be able to land this PROJ integration afterwards afterwards. A GDAL 2.5.0 release will hopefully follow around May 2019 with the result of the gdalbarn work.</div>
<br />
<br />
<div style="text-align: justify;">
Other side activities regarding collecting transformation grids:</div>
<div style="text-align: justify;">
<ul>
<li>Following a clarification
from IGN France on the open licensing of their <a href="https://geodesie.ign.fr/index.php?p=61&page=documentation">geodesy related resources</a>,
their CRS and transformation XML registry is now processed to populate
the IGNF objects in the proj.db database (the previous import
used the already processed IGNF file containing PROJ string, which
caused information losses). The associated vertical shift grids have
also been converted from their text-based format to the PROJ digestable
.gtx format, integrated in the <a href="https://github.com/OSGeo/proj-datumgrid/tree/master/europe">proj-datumgrid-europe package</a>, and they have been referenced in the database for transformations that use them.</li>
<li>The NGS GEOID 2012B vertical grids to convert between NAD83 ellipsoidal heights and NAVD88 heights have also been integrated in the <a href="https://github.com/OSGeo/proj-datumgrid/tree/master/north-america">proj-datumgrid-north-america package</a></li>
</ul>
</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-51045460625466221912018-10-31T04:17:00.001-07:002018-10-31T04:17:51.196-07:00SRS barn raising: 5th report<div style="text-align: justify;">
This is the fifth progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort. A lot of activity in PROJ developments has been done this month again.<br />
<br />
New conversion and transformation methods referenced in the EPSG database have been integrated:<br />
<ul>
<li>Projection methods: Lambert Conic Conformal (2SP Michigan) and Laborde Oblique Mercator</li>
<li>"Change of Vertical Unit", "Vertical offset" and other fixes relative to vertical component handling</li>
<li>"Axis order reversal" </li>
<li>"Affine parametric transformation", and its implementation in PROJ computation core</li>
<li>"Molodensky-Badekas" transformation, and its implementation in PROJ computation core </li>
</ul>
</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
Regarding database-related activities:<br />
<ul>
<li>Make concatenate operation building from proj.db robust to inverse sub-operations (the EPSG database lists chained operations, but does not indicate if the forward or reverse path must be taken)</li>
<li>Reference transformation grids available in the proj-datumgrid-* packages, such as NAD83 -> NAD83(HPGN) grids, OSTN15_NTv2_OSGBtoETRS.gsb</li>
<li>Add a celestial_body table and celestial_body property to ellipsoid, as a provision to handle non-Earth bodies</li>
<li>Add a text_definition column to geodetic_crs and projected_crs to allow definition by PROJ string (or WKT)</li>
<li>Add the possibility of defining in 'other_transformation' a transformation defined by a PROJ pipeline</li>
<li>The createOperations() method that returns transformations between two CRS is now able to find pivot CRS when no direct transformation path exists. One notable fact to underline is that the pivot is not necessarily WGS84, and several candidate pivots are explored. When transforming from CRS A to CRS B, the database will be searched for all CRS C for which there is a referenced transformation from/to both CRS A and CRS B (advanced users may also be able to restrict the candidate(s) pivot to use)</li>
<li>Import PROJ.4 definitions contained currently <a href="https://github.com/OSGeo/proj.4/blob/master/data/IGNF">data/IGNF</a> into the database in a relational form. A script to directly use the <a href="http://librairies.ign.fr/geoportail/resources/IGNF.xml">official registry</a> as the source, instead of the PROJ.4 strings that derive from it, has also been developed but is not used yet.</li>
<li>Import of the CRS definitions and transformations between horizontal CRS from the CSV files of the published <a href="https://github.com/Esri/projection-engine-db-doc/tree/master/csv">Esri Projection Engine Database Documentation</a>. The projected CRS are imported with their ESRI WKT definition directly put in the database, so ongoing work is in progress to be able to ingest this WKT1 variant in the WKT2-based form used internally by PROJ.</li>
<li>Update the CRS and transformations to EPSG dataset v9.5.4</li>
</ul>
</div>
<div style="text-align: justify;">
The addition of the DerivedEngineeringCRS, DerivedParametricCRS and DerivedTemporalCRS classes mark the completion of the implementation of the full ISO-19111:2018 CRS modelling<br />
<br />
A rather tedious task has consisted in optimizing the code (mostly by avoiding a lot of unnecessary instanciation of temporary C++ objects, involving looking frequently at disassembled output to locate them) to make the generated binary lighter. This has enabled an optimized build to go down from 3.1 to 2.6 MB<br />
<br />
The Doxygen-generated documentation of the C++ API has been integrated with the general PROJ documentation in ReST format, with the Breathe module.</div>
<br /><div style="text-align: justify;">
An initial C API that makes part of the C++ functionality available to C users has also been added.<br />
<br />
Finally, a <a href="https://proj4.org/community/rfc/rfc-2.html">RFC</a> has been written to officially submit this work for PROJ PSC approval (doubling the size of a code base and changing its language requirements is not a mundane detail). This RFC has now been officially adopted, and will make it possible to ultimately merge this work into PROJ master. You may skim through it to look at <a href="https://proj4.org/community/rfc/rfc-2.html#utilities">a few examples</a> of the use of the projinfo utility that demonstrates part of the new capabilities developed.<br />
<br /></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-27709124463206780042018-09-28T04:44:00.001-07:002018-09-28T05:11:04.967-07:00SRS barn raising: 4th report<div style="text-align: justify;">
This is the fourth progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort.</div>
<br />
<div style="text-align: justify;">
The in-progess task at the time of the last report was the addition of a method createFromPROJString() that takes a PROJ string and builds ISO-19111 corresponding objects. It has now been completed for ProjectedCRS. Not all arbitrary valid PROJ strings can be currently mapped to ISO-19111 objects, but at least strings expressing CRS definitions can. A few constructs involving Helmert transforms are also parsed as CoordinateOperation.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A few random tasks completed:</div>
<ul style="text-align: justify;">
<li>Map time-dependent Helmert transforms, and fixes for existing Helmert transforms</li>
<li>Map Molodensky and Abridged Molodensky transformation methods from/to PROJ strings</li>
<li>Map Longitude rotation transformation method</li>
<li>Update to recognize VRF and TRF WKT2-2018 keywords</li>
<li>Add import/export of VERTICALEXTENT and TIMEEXTENT WKT2 elements</li>
<li>Add import/export of WKT2:2018 USAGE node</li>
<li>Progress in implementation of "esoteric" ISO-19111 classes: DatumEnsemble, DynamicVerticalReferenceFrame, OrdinalCS, DerivedProjectedCRS, EngineeringCRS, ParametricCRS, DerivedVerticalCRS</li>
<li>Several fixes</li>
</ul>
<br />
<div style="text-align: justify;">
The most interesting task regarding the high-level objectives of the roadmap of the barn campaign is the creation of a proj.db SQLite3 database containing CRS (and related objects) and coordinate operations (datum shifts, ...) definitions. The workflow to build it is:</div>
<ol style="text-align: justify;">
<li>Import EPSG dataset PostgreSQL .sql dumps</li>
<li>Run scripts/build_db.py that will ingest those dumps in a temporary SQLite3 database and then extract needed information from it and marshall it a more digestable form for our final proj.db. At the end the script, outputs new .sql scripts in the data/sql subdirectory of the PROJ directory. We keep in version control those text files, for better tracability of changes.</li>
<li>At make time, we build the proj.db database by importing those .sql scripts.</li>
</ol>
<div style="text-align: justify;">
Steps 1 and 2 are done (typically by PROJ developers) each time you need to update to a newer version.</div>
<div style="text-align: justify;">
Step 3 is done automatically at PROJ build time, from a git clone or a tarball of an official release.</div>
<br />
<div style="text-align: justify;">
The proj.db structure allows for multiple authorities. Instead of having a single code column to identify and reference objects, we use a tuple (authority_name, code) as a key, both columns being of text type for better generality. So ('EPSG', '4326') or ('IGNF', 'LAMB1') are possible. At that time, only EPSG derived objects are in the database. Import of other dictionaries are task for later.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Having a database is good, but using it is better. So the next task was to implement a factory class able to build an object in our ISO-19111 modelling from its (auth_name, code). This is now completed for all object categories of the database. The in-progress task is the generalization/augmentation of the current method createOperation(sourceCRS, targetCRS) that creates coordinate method without external input than the definition of the CRS as a createOperations(sourceCRS, targetCRS, context) that also uses the proj.db database to find registered coordinate operations (typically between the geographic CRS), and take into account specified area of interest and desired accuracy, to propose a list of candidate coordinate operations (chaining for example the reverse projection, a datum shift, and a forward projection). I'm also working on a new `projinfo`utility that will be similar in purpose to the `gdalsrsinfo`, offering the possibility to ingest PROJ strings (legacy PROJ.4 format and new PROJ pipelines), WKT 1, WKT 2, authority codes and output as PROJ (legacy and new), WKT 1, WKT2:2015, WKT2:2018. A mode to list coordinate operations possible between two CRS will also be available.<br />
<br />
One interesting statistics: the number of lines of C++ code (including blank lines and comments) added to PROJ per this work is now greater than the number of historical C code: 47 000 lines (14 K being tests) vs 43 000. </div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-91692363877131706052018-08-29T07:23:00.000-07:002018-08-29T07:23:20.288-07:00SRS barn raising: 3rd reportThis is the third progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort.<br />
<br />
<div style="text-align: justify;">
In the last month, the main task was to continue, and finish, the mapping of all GDAL currently supported projection methods between their different representations as WKT 2, WKT 1 and PROJ string: LCC_2SP, LCC_2SP_Belgium, Modified Azimuthal Equidistant, Guam Projection, Bonne, (Lambert) Cylindrical Equal Area, GaussSchreiberTransverseMercator, CassiniSoldner, EckertI to VI, EquidistantCylindricalSpherical, Gall, GoodeHomolosine, InterruptedGoodeHomolosine, GeostationarySatelliteSweepX/Y, International Map of the World Polyconic, Krovak North Oriented and Krovak, LAEA, Mercator_1SP and Mercator_2SP, WebMercator (including GDAL WKT1 import/export tricks), Mollweide, ObliqueStereographic and Orthographic, American polyconic, PolarSterographicVariantA and PolarSterographicVariantB, Robinson and Sinusoidal, Stereographic, VanDerGrinten, Wagner I to VII, QuadrilateralizedSphericalCube, SphericalCrossTrackHeight, Aitoff, Winkel I and II, Winkel Tripel, Craster_Parabolic, Loximuthal and Quartic_Authalic</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The task was tedious, but necessary. For some cases, this involved cross-checking formulas in the <a href="http://www.epsg.org/Portals/0/373-07-2.pdf">EPSG "Guidance Note 7, part 2 Coordinate Conversions & Transformations including Formulas"</a>, PROJ implementation and Snyder "<a href="https://pubs.usgs.gov/pp/1395/report.pdf">Map Projections - A Working Manual</a>" because of ambiguities in some projection names. Typically the <a href="https://www.epsg-registry.org/export.htm?gml=urn:ogc:def:method:EPSG::9809">ObliqueStereographic in EPSG</a> is not the Oblique Stereographic of Snyder. The former is implemented as the Oblique Stereographic Alternative (<a href="https://proj4.org/operations/projections/sterea.html">sterea</a>) in PROJ, and the later as the Oblique Stereographic (<a href="https://proj4.org/operations/projections/stere.html">stere</a>). The parameter names in WKT 2 / EPSG tend also to be much more specific that in GDAL WKT 1. When in GDAL WKT1, you have mostly a "latitude_of_origin" parameter mapping to the lat_0 PROJ parameter, in WKT2, parameter names tend to better reflect the mathematical characteristics of the projection, distinguishing between "Latitude of natural origin", "Latitude of projection centre" or "Latitude of false origin"</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The currently ongoing task is now to implement a method that takes a PROJ string and builds ISO-19111 corresponding objects. Done for GeographicCRS (+proj=longlat), in progress for Projected CRS. When this will be completed we will have the infrastructure to convert in all directions between PROJ strings, WKT 1 and WKT 2</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
When digging into PROJ code, I also uncovered a number of issues in the Helmert implementation (confusing for rotational parameters regarding the "Position Vector" vs "Coordinate frame" convention), the handling of the not-so-well-known +geoc flag for geocentric latitudes and the handling of vertical units for geographic CRS with the new PROJ API. All of those fixes have been independantly merged in PROJ master, so as to be available for the upcoming PROJ 5.2.0, which should be released around mid-september (to remove any confusion, this release will not include yet all the WKT 2 related work)</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-3369574503850628612018-07-26T07:56:00.000-07:002018-07-26T07:56:13.825-07:00SRS barn raising: 2nd reportThis is the second progress report of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort.<br />
<br />
Since the <a href="https://erouault.blogspot.com/2018/06/the-barn-is-raising.html">previous report</a>, a number of topics have been addressed:<br />
<div style="text-align: justify;">
- extension of the class hierarchy to implement BoundCRS (the generalization of the WKT1 TOWGS84 concept. This concept only exists in WKT 2 and has not been modeled in ISO-19111, so I went on my own modelling), TimeCRS, DerivedGeodeticCRS</div>
<div style="text-align: justify;">
- implementation of the exportToPROJ() method for CRS and related objects, and CoordinateOperation</div>
<div style="text-align: justify;">
- addition of all documentation needed at class and method level so that Doxygen passes without warnings</div>
<div style="text-align: justify;">
- implementation of <span class="blob-code-inner ">CoordinateOperation::</span>createOperation() method that can instanciate a transformation between two CRSs. For now, it does not use yet the operations of the EPSG database, but it can already take into account the hints of BoundCRS.</div>
<div style="text-align: justify;">
- implementation of a number of Transformations: geocentric translation, position vector transformation, coordinate frame rotation, NTv2, GravityRelatedHeightToGeographic3D, VERTCON.</div>
<div style="text-align: justify;">
- start of mapping all GDAL currently supported projection methods. For now: UTM, Transverse Mercator, Transerve Mercator South Oriented, Two Point Equidistant, Tunisia Mapping Grid, Albers Conic Equal Area, Lambert Conic Conformal 1SP, New Zealand Map Grid. Several tens are obviously still missing.</div>
<div style="text-align: justify;">
- addition of a isEquivalentTo() method to compare the various objects. </div>
<div style="text-align: justify;">
- and of course, extensive unit testing of all the above work.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The result of this continued work can be followed in this <a href="https://github.com/OSGeo/proj.4/pull/1040">pull request.</a></div>
<br />
<div style="text-align: justify;">
As a related effort, I've coordinated with the OGC CRS Working Group to provide my comments on the upcoming ISO:19168 / WKTv2 2018 revision.</div>
<br />Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-63834696400507593802018-06-18T04:40:00.001-07:002018-06-18T04:40:43.844-07:00The barn is raising<div style="text-align: justify;">
Thanks to the support given by the sponsors of the <a href="https://gdalbarn.com/">GDAL SRS barn</a> effort, I have been able to kick in the first works in the past weeks. The work up to now has been concentrated on the PROJ front.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The first step was to set a foundation of C++ classes that implement the <a href="http://portal.opengeospatial.org/files/?artifact_id=39049">ISO-19111 / OGC Topic 2 "Referencing by coordinates"</a> standard. Actually I have anticipated the future adoption of the 18-005r1 <a href="http://www.opengeospatial.org/standards/requests/166">2018 revision</a> of the standard that takes into account the latest advances in the modelling of coordinate reference systems (in particular dynamic reference frames, geoid-based vertical coordinate reference systems, etc.), which will be reflected in the corresponding update of the WKT2:2018 standard and future updates of the EPSG dataset. If you are curious, you can skim through the resulting PROJ <a href="http://even.rouault.free.fr/proj_cpp_api/html/inherits.html">class hierarchy</a> which is really close to the abstract specification (a number of those classes currenty lack a real implementation for now). With the agreement of the newly born <a href="https://proj4.org/community/rfc/rfc-1.html">PROJ project steering committee</a>, I have opted for C++11 which offers a number of useful modern features to reduce boilerplate and concentrate on the interesting aspects of the work.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
On the functional front, there is already support to read WKT1 (its GDAL variant for now) and WKT2 strings and build a subset of the before mentionned C++ objects. And conversely to dump those C++ objects as WKT1 and WKT2 strings. In particular you can import from WKT1 and export to WKT2, or the reverse (within the limitations of each format). So this abstract modelling (quite close to WKT2 of course) effectively serves its purpose to help being independant from the actual representation of the CRS. As I mentionned an early adoption of the OGC Topic 2 standard, similarly I've taken into account the future WKT2:2018 (OGC 18-010) standard that aligns well with the abstract specification. In the API, the user can select if he wants to export according to the currently adopted version <a href="http://docs.opengeospatial.org/is/12-063r5/12-063r5.html">WKT2:2015 (OGC </a><span><a href="http://docs.opengeospatial.org/is/12-063r5/12-063r5.html">12-063r5)</a>, or with the future WKT2:2018 revision.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The result of those first steps can be followed in this <a href="https://github.com/OSGeo/proj.4/pull/1040">pull request.</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Another task that has been accomplished is the addition of the Google Test C++ testing framework to PROJ (thanks to Mateusz Loskot for his help with the CMake integration), so all those new features can be correctly tested locally and on all platforms supported by PROJ continuous integration setup.</div>
<br /><div style="text-align: justify;">
There are many future steps to do just on the PROJ front :</div>
<ul>
<li>implement remaining classes</li>
<li>code documentation</li>
<li>comprehensive support of projection methods (at least the set currently supported by GDAL)</li>
<li>import from and export to PROJ strings for CRS definitions and coordinate operations</li>
<li>use of the EPSG dataset</li>
</ul>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com1tag:blogger.com,1999:blog-7797431205725571762.post-72662704154549626542018-06-15T07:31:00.000-07:002018-06-16T06:47:55.323-07:00SQLite and POSIX advisory locks<div style="text-align: justify;">
Those past couple days, I was working on implementing multi-layer transaction support for GeoPackage datasources (for QGIS 3.4). Multi-layer transaction is an advanced functionality of QGIS (you have to enable it in project settings), initially implemented for PostgreSQL connections where several layers can be edited together so as to have atomic modifications when editing them. Modifications are automatically sent to the database, using SQL savepoints to implement undo/redo operations, instead of being queued in memory and committed at once when the user stops editing the layer.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
While debugging my work during development, I stumbled upon a heisenbug. From time to time, the two auxiliary files attached to a SQLite database opened in <a href="https://www.sqlite.org/wal.html">Write Ahead Logging (WAL)</a> mode, suffixed -wal and -shm, would suddenly disappear, whereas the file was still being opened by QGIS. As those files are absolutely required, the consequence of this was that following operations on the database failed: new readers (in the QGIS process) would be denied opening the file, and QGIS could not commit any new change to it. When the file was closed, the file returned again in a proper state (which shows the robustness of sqlite). After some time, I realized that my issue arised exactly when I observed the database being edited by QGIS with an external ogrinfo on it (another way to reproduce the issue would be to open a second QGIS instance on the same file and close it). I indeed used ogrinfo to check that the state of the database was consistent during the editing operations. Okay, so instead of a random bug, I had now a perfectly reproducable bug. Half of the way to solve it, right ?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
How come ogrinfo, which involves read-only operations, could cause those -wal and -shm files to disappear ? I had some recollection of code I had written in the OGR SQLite driver regarding this. When a dataset opened in read-only mode is closed by OGR, it checks if there's a -wal file still existing (which could happen if a database had not been cleanly closed, like a killed process), and if so, it re-opens it temporarily in update mode, does a dummy operation on it, and close it. If the ogrinfo process is the only one that had a connection on the database, libsqlite would remove the -wal and -shm files automatically (OGR does not directly remove the file, it relies on libsqlite wisdom to determine if they can be removed or not). But wait, in my above situation, ogrinfo was not the exclusive process operating on the database: QGIS was still editing it.... Would that be a bug in the venerable libsqlite ??? (spoiler: no)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I tried to reproduce the situation with replacing QGIS by a plain sqlite console opening the file, and doing a ogrinfo on it. No accidental removal of the -wal and -shm files. Okay, so what is the difference between QGIS and the sqlite console (beside QGIS having like one million extra line of code;-)). Well, QGIS doesn't directly use libsqlite3 to open GeoPackage databases, but uses the OGR GPKG driver. So instead of opening with QGIS or a sqlite3 console, what if I opened with the OGR GPKG driver ? Bingo, in that situation, I could also reproduce the issue. So something in OGR was the culprit. I will save you of the other details, but at the end it turned out that if OGR was opening itself a .gpkg file using standard file API, whereas libsqlite3 was opening it, chaos would result. This situation can happen since for example when opening a dataset, OGR has to open the underlying file to at least read its header and figure out which driver would handle it. So the sequence of operation is normally:</div>
<div style="text-align: justify;">
1) the GDALOpenInfo class opens the file</div>
<div style="text-align: justify;">
2) the OGR GeoPackage driver realizes this file is for it, and use the sqlite3_open() API to open it</div>
<div style="text-align: justify;">
3) the GDALOpenInfo class closes the file it has opened in step 1 (libsqlite3 still manages its own file handle)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
When modifying the above sequence, so that 3) is executed before 2), the bug would not appear. At that point, I had some recollection that sqlite3 used POSIX advisory locks to handle concurrent accesses, and that there were some issues with that POSIX API. Digging into the sqlite3.c source code revealed a very interesting 86 line long comment about how POSIX advisory locks are broken by design. The main brokenness are they are advisory and not compulsory of course, but as this is indicated in the name, one cannot really complain about that being a hidden feature. The most interesting finding was: """If you close a file descriptor that points to a file that has locks, all locks on that file that are owned by the current process are released.""" Bingo: that was just what OGR was doing.</div>
<div style="text-align: justify;">
My above workaround (to make sure the file is closed before sqlite opens it and set its locks) was OK for a single opening of a file in a process. But what if the user wants to open a second connection on the same file (which arises easily in the QGIS context) ? The rather ugly solution I came off was that the OGR GPKG driver would warn the GDALOpenInfo not to try to open a given file while it was still opened by the driver and pass it the file header it would be supposed to find if it could open the file, so that the driver identification logic can still work. Those fixes are queued for GDAL 2.3.1, whose release candidate is planned next Friday.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Conclusions:</div>
<ul>
<li>never ever open (actually close) a SQLite database with regular file API while libsqlite3 is operating on it (in the same process)</li>
<li>POSIX advisory locks are awful. </li>
</ul>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com2tag:blogger.com,1999:blog-7797431205725571762.post-5161847657239136822017-10-12T09:41:00.002-07:002017-10-12T09:41:47.176-07:00Optimizing JPEG2000 decoding<div style="text-align: justify;">
Over this summer I have spent 40 days (*) in the guts of the <a href="http://www.openjpeg.org/">OpenJPEG</a> open-source library (BSD 2-clause licensed) optimizing the decoding speed and memory consumption. The result of this work is now available in the <a href="http://www.openjpeg.org/2017/10/04/OpenJPEG-2.3.0-released">OpenJPEG 2.3.0 release</a>.</div>
<br />
<div style="text-align: justify;">
For those who are not familiar with J<a href="https://en.wikipedia.org/wiki/JPEG_2000">PEG-2000</a>, and they have a lot of excuse given its complexity, this is a standard for image compression, that supports lossless and lossy methods. It uses discrete wavelet transform for multi-resolution analysis, and a context-driven binary arithmetic coder for encoding of bit plane coefficients. When you go into the depths of the format, what is striking is the number of independent variables that can be tuned:</div>
<br />
- use of tiling or not, and tile dimensions<br />
- number of resolutions<br />
- number of quality layers <br />
- code-block dimensions<br />
<div style="text-align: justify;">
- 6 independent options regarding how code-blocks are encoded (code-block styles): use of Selective arithmetic coding bypass, use of Reset context probabilities on coding pass boundaries, use of Termination on each coding pass, use of Vertically stripe causal context, use of Predictable termination, use of Segmentation Symbols. Some can bring decoding speed advantages (notably selective arithmetic coding bypass), at the price of less compression efficiency. Others might help hardware based implementations. Others can help detecting corruption in the codestream (predictable termination)</div>
<div style="text-align: justify;">
- spatial partition of code-blocks into so-called precincts, whose dimension may vary per resolution</div>
<div style="text-align: justify;">
- progression order, ie the criterion to decide how packets are ordered, which is a permutation of the 4 variables: Precincts, Component, Resolution, Layer. The standard allows for 5 different permutations. To add extra fun, the progression order might be configured to change several times among the 5 possible (something I haven't yet had the opportunity to really understand)</div>
- division of packets into tile-parts<br />
- use of multi-component transform or not<br />
- choice of lossless or lossy wavelet transforms<br />
- use of start of packet / end of packet markers<br />
- use of Region Of Interest, to have higher quality in some areas<br />
<div style="text-align: justify;">
- choice of image origin and tiling origins with respect to a reference grid (the image and tile origin are not necessarily pixel (0,0))</div>
<br />
<div style="text-align: justify;">
And if that was not enough, some/most of those parameters may vary per-tile! If you already found that TIFF/GeoTIFF had too many parameters to tune (tiling or not, pixel or band interleaving, compression method), JPEG-2000 is probably one or two orders of magnitude more complex. JPEG-2000 is truly a technological and mathematical jewel. But needless to say that having a compliant JPEG-2000 encoder/decoder, which OpenJPEG is (it is an official reference implementation of the standard) is already something complex. Having it perform optimally is yet another target.</div>
<br />
<div style="text-align: justify;">
Previously to that latest optimization round, I had already worked at enabling multi-threaded decoding at the code-block level, since they can be decoded independently (once you've re-assembled from the code-stream the bytes that encode a code-block), and in the inverse wavelet transform as well (during the horizontal pass, resp vertical pass, rows, resp columns, can be transformed independently). But the single-thread use had yet to be improved. Roughly, around 80 to 90% of the time during JPEG-2000 decoding is spent in the context-driven binary arithmetic decoder, around 10% in the inverse wavelet transform and the rest in other operations such as multi-component transform. I managed to get <a href="https://github.com/uclouvain/openjpeg/pull/945">around 10% improvement</a> in the global decompression time by porting to the decoder an optimization that had been proposed by Carl Hetherington for the encoding side, in the code that determines which bit of wavelet transformed coefficient must be encoded during which coding pass. The trick here was to reduce the memory needed for the context flags, so as to decrease the pressure on the CPU cache. Other optimizations in that area have consisted in making sure that some critical variables are kept preferably in CPU registers rather than in memory. I've spent a good deal of time looking at the disassembly of the compiled code.<br />
</div>
<div style="text-align: justify;">
I've also <a href="https://github.com/uclouvain/openjpeg/pull/957">optimized the reversible (lossless) inverse transform</a> to use the Intel SSE2 (or AVX2) instruction sets to be able to process several rows, which can result up to 3x speed-up for that stage (so a global 3% improvement)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I've also worked on reducing the memory consumption needed to decode images, by <a href="https://github.com/uclouvain/openjpeg/pull/968">removing the use of intermediate buffers when possible</a>. The result is that the amount of memory needed to do full-image decoding was reduced by 2.4.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Another major work direction was to optimize speed and memory consumption for sub-window decoding. Up to now, the minimal unit of decompression was a tile. Which is OK for tiles of reasonable dimensions (let's say 1024x1024 pixels), but definitely not on images that don't use tiling, and that hardly fit into memory. In particular, OpenJPEG couldn't open images of more than 4 billion pixels. The work has consisted in 3 steps :</div>
<div style="text-align: justify;">
- <a href="https://github.com/uclouvain/openjpeg/pull/990">identifying which precincts and code-blocks are needed</a> for the reconstruction of a spatial region</div>
<div style="text-align: justify;">
- <a href="https://github.com/uclouvain/openjpeg/pull/1001">optimize the inverse wavelet transform</a> to operate only on rows and columns needed</div>
<div style="text-align: justify;">
- <a href="https://github.com/uclouvain/openjpeg/pull/1010">reducing the allocation of buffers</a> to the amount strictly needed for the subwindow of interest</div>
<div style="text-align: justify;">
The overall result is that the decoding time and memory consumption are now roughly proportional to the size of the subwindow to decode, whereas they were previously constant. For example decoding 256x256 pixels in a 13498x9944x3 bands image takes now only 190 ms, versus about 40 seconds before.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As a side activity, I've also fixed 2 different annoying bugs that could cause lossless encoding to not be lossless <a href="https://github.com/rouault/openjpeg/commit/73d1510d473b7dcfccfdee57e0e511e6791d5091">for some combinations of tile sizes and number of resolutions</a>, or when <a href="https://github.com/rouault/openjpeg/commit/81c5311758a0ae1f1aea349a6ee0bca2a238fa79">some code-block style options</a> were used.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I've just updated the <a href="http://gdal.org/frmt_jp2openjpeg.html">GDAL OpenJPEG driver</a> (in GDAL trunk) to be more efficient when dealing with untiled JPEG-2000 images.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There are many more things that could be done in the OpenJPEG library :</div>
<div style="text-align: justify;">
- port a number of optimizations on the encoding side: multi-threadig, discrete wavelet transform optimizations, etc...</div>
<div style="text-align: justify;">
- on the decoding side, reduce again the memory consumption, particularly in the untiled case. Currently we need to ingest into memory the whole codestream for a tile (so the whole compressed file, on a untiled image)</div>
<div style="text-align: justify;">
- linked to the above, use of TLM and PLT marker segments (kind of indexes to tiles and packets)</div>
<div style="text-align: justify;">
- on the decoding side, investigate further improvements for the code specific of irreversible / lossy compression</div>
<div style="text-align: justify;">
- make the opj_decompress utility do a better use of the API and consume less memory. Currently it decodes a full image into memory instead of proceeding by chunks (you won't have this issue if using gdal_translate)</div>
<div style="text-align: justify;">
- investigate how using GPGPU capabilities (CUDA or OpenCL) could help reduce the time spent in context-driven binary arithmetic decoder.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="http://www.spatialys.com/en/contact/">Contact me</a> if you are interested in some of those items (or others !)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<br />
<hr />
(*) funding provided by academic institutions and archival organizations, namely
<br />
<ul>
<li><a href="https://wellcomelibrary.org/">Wellcome Library</a></li>
<li><a href="https://www.stanford.edu/">Stanford University</a></li>
<li><a href="https://www.kb.nl/en">Nationale Bibliotheek van Nederland (KBNL)</a></li>
<li><a href="https://www.umich.edu/">University of Michigan</a></li>
<li><a href="http://www.ucla.edu/">University of California, Los Angeles (UCLA)</a></li>
</ul>
… And logistic support from the International Image Interoperability Framework (<a href="http://iiif.io/">IIIF</a>), the Council on Library and Information Resources (<a href="https://www.clir.org/">CLIR</a>), <a href="http://www.intopix.com/">intoPIX</a>, and of course the Image and Signal Processing Group (<a href="http://sites.uclouvain.be/ispgroup/index.php/Main/HomePage">ISPGroup</a>) from University of Louvain (<a href="https://uclouvain.be/">UCL</a>, Belgium) hosting the OpenJPEG project.Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-4469156388850954582017-10-11T10:48:00.002-07:002017-10-11T10:48:31.864-07:00GDAL and cloud storageIn the past weeks, a number of improvements related to access to cloud storage have been committed to GDAL trunk (future GDAL 2.3.0)<br />
<br />
<h3>
Cloud based virtual file systems </h3>
<br />
<div style="text-align: justify;">
There was already support to access private data in Amazon S3 buckets through the <a href="http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsis3">/vsis3/</a> virtual file system (VFS). Besides a few robustness fixes, a few new capabilities have been added, like creation and deletion of directories inside a bucket with VSIMkdir() / VSIRmdir(). The authentication methods have also been extended to support, beyond the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables, the other ways accepted by the "aws" command line utilities, that is to say storing credentials in the ~/.aws/credentials or ~/.aws/config files. If GDAL is executed since a Amazon EC2 instance that has been assigned rights to buckets, GDAL will automatically fetch the instance profile credentials.</div>
<br />
<div style="text-align: justify;">
The existing read-only <a href="http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsigs">/vsigs/</a> VFS for Google Cloud Storage as being extended with write capabilities (creation of new files), to be on feature parity with /vsis3/. The authentication methods have also been extended to support OAuth2 authentication with a refresh token, or with service account authentication. The credentials can be stored in a ~/.boto configuration file. And when run from a Google Compute Engine virtual machine, GDAL will automatically fetch the instance profile credentials.</div>
<br />
<div style="text-align: justify;">
Two new VFS have also been added, <a href="http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsiaz">/vsiaz/</a> for Microsoft Azure Blobs and <a href="http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsioss">/vsioss/</a> for Alibaba Cloud Object Storage Service. They support read and write operations similarly to the two previously mentioned VFS. </div>
<br />
<br />
To make file and directory management easy, a number of Python sample scripts have been created or improved:<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/gdal_cp.py">gdal_cp.py</a> my.tif /vsis3/mybucket/raster/<br />
gdal_cp.py -r /vsis3/mybucket/raster /vsigs/somebucket<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/gdal_ls.py">gdal_ls.py</a> -lr /vsis3/mybucket<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/gdal_rm.py">gdal_rm.py</a> /vsis3/mybucket/raster/my.tif<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/gdal_mkdir.py">gdal_mkdir.py</a> /vsis3/mybucket/newdir<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/gdal_rmdir.py">gdal_rmdir.py</a> -r /vsis3/mybucket/newdir<br />
<br />
<h3>
Cloud Optimized GeoTIFF</h3>
<br />
<div style="text-align: justify;">
Over the last past few months, there has been <a href="http://www.cogeo.org/">adoption by various actors</a> of the <a href="https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF">cloud optimized formulation</a> of GeoTIFF files, which enables clients to efficiently open and access portions of a GeoTIFF file available through HTTP GET range requests.<br />
<br />
<a href="https://github.com/rouault/cog_validator">Source code for a online service</a> that offers validation of cloud optimized GeoTIFF (using GDAL and the <a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py">validate_cloud_optimized_geotiff.py</a> script underneath) and can run as a AWS Lambda function is available. Note: as the current definition of what is or is not a cloud optimized formulation has been uniteraly decided up to now, it cannot be excluded that it might be changed on some points (for example relaxing constraints on the ordering of the data of each overview level, or enforcing that tiles are ordered in a top-to-bottom left-to-right way)<br />
<br />
GDAL trunk has received improvements to speed up access to sub windows of a GeoTIFF file by making sure that the tiles that participate to a sub-window of interest are requested in parallel (this is true for public files accessed through /vsicurl/ or with the four above mentioned specialized cloud VFS), by reducing the amount of data fetched to the strict minimum and merging requests for consecutive ranges. In some environments, particularly when accessing to the storage service of a virtual machine of the same provider, HTTP/2 can be used by setting the GDAL_HTTP_VERSION=2 configuration option (provided you have a <a href="https://curl.haxx.se/docs/http2.html">libcurl recent enough</a> and built against nghttp2). In that case, HTTP/2 multiplexing will be used to request and retrieve data on the same HTTP connection (saving time to establish TLS for example). Otherwise, GDAL will default to several parallel HTTP/1.1 connections. For long lived processes, efforts have been made to re-use as much as possible existing HTTP connections.<br />
<br /></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-82822371872946784002017-03-11T14:25:00.000-08:002017-03-11T14:25:35.348-08:00Dealing with huge vector GeoPackage databases in GDAL/OGR<div style="text-align: justify;">
Recently, I've fixed a bug in the OGR <a href="http://gdal.org/drv_openfilegdb.html">OpenFileGDB driver</a>, the driver made from the reverse engineering the ESRI FileGeoDatabase format, so as to be able to read tables whose section that enumerates and describes fields is located beyond the first 4 GB of the file. This <a href="https://www2.census.gov/geo/tiger/TGRGDB16/tlgdb_2016_a_us_edges.gdb.zip">table from the 2016 TIGER database</a> is indeed featuring all linear edges of the USA and is 15 GB large (feature and spatial indexes included), with 85 million features. <br /><br />Some time before, I had to deal with a smaller database - 1.7 GB as GeoPackage - with 5.4 million polygons (bounding box) from the cadastre of an Italian province. One issue I noticed is that when you want to get the summary of the layer, with ogrinfo -al -so the.gpkg, it was very slow. The reason is that this summary includes the feature count, and there's no way to get this metadata quickly, apart from running the "SELECT COUNT(*) FROM the_table" request, which causes a full scan of the table. For small databases, this runs fast, but when going into the gigabyte realm, this can take several dozains of seconds. But getting the spatial extent of the layer, which is one of the other information displayed by the summary mode of ogrinfo, is fast since the gpkg_contents "system" table of a GeoPackage database includes the bounding box of the table. So my idea was to extend the definition of the gpkg_contents table to add a new column, ogr_feature_count, to store the feature count. I went to implement that, and it worked fine. The synchronization of the value of ogr_feature_count after edits can be done with 2 SQLite triggers, on row insertion and deletion, and that works with implementations that are not aware of the existence of this new column. Like older OGR versions. Unfortunately it appears that at least one other implementation completely rejected such databases. There is some inconsistency in the <a href="http://www.geopackage.org/spec/">GeoPackage specification</a> if additional columns are accepted or not in system tables. From the /base/core/contents/data/table_def test case, "Column order, check constraint and trigger definitions, and other column definitions in the returned sql are irrelevant.", it would seem that additional columns should still be considered as a valid GeoPackage. Anyway, that's only the theory and we don't want to break interoperability for just a nice-to-have feature... So I went to change the design a bit and created a new table, gpkg_ogr_contents, with a table_name and feature_count columns. I'm aware that I should not borrow the gpkg_ prefix, but I felt it was safer to do so since other implementations will probably ignore any unknown gpkg_ prefixed table. And the addition of the ogr_ prefix makes collisions with future extension of the GeoPackage specification unlikely. The content of this table is also maintained in synchronization with the data table thanks to two triggers, and this makes the other software that rejected my first attempt happy. Problem solved.<br /><br />Let's come back to our 13 GB FileGeoDatabase. My first attempt to convert is to GeoPackage with ogr2ogr resulted in converting the features in about half an hour, but once this 100% stage reached, the finalization, which includes building the spatial index took ages. So long, that after a whole night it wasn't yet finished and seriously making the computer non responsive due to massive I/O activity. In the GeoPackage driver, the spatial index is indeed created after feature insertion, so that the feature table and spatial index tables are well separated in the file, and from previous experiment with the Spatialite driver, it proved to be the right strategy. Populating the <a href="https://www.sqlite.org/rtree.html">SQLite R-Tree</a> is done with a simple statement: INSERT INTO my_rtree SELECT fid, ST_MinX(geom), ST_MaxX(geom), ST_MinY(geom), ST_MaxY(geom) FROM the_table. Analyzing what happens in the SQLite code is not easy when you are not familiar with that code base, but my intuition is that there was constant back and forth between the geometry data area and the RTree area in the file, making the SQLite page cache inefficient. So I decided to experiment with a more progressive approach. Let's iterate over the feature table and collect the fid, minx, max, miny, maxy by chunks of 100 000 rows, and the insert those 100 000 bounding boxes into the R-Tree, and loop again unil we have completely read the feature table. With such a strategy, the spatial index can now be built in 4h30. The resulting GeoPackage file weights 31.6 GB, so twice at large than the FileGeoDatabase. One of the reasons for the difference must be due to geometries in FileGeoDatabase being compressed (quantization for coordinate precision, delta encoding and use of variable integer) whereas GeoPackage uses a uncompressed SQLite BLOB based on OGC WKB.<br />My first attempt at opening it in QGIS resulted in the UI to be frozen, probably for hours. The reason is that QGIS always issues a spatial filter, even when requesting on a area of interest that is at least as large as the extent of the layer, where there is no performance gain to expect from using it. So the first optimization was in the OGR GeoPackage to detect that situation and to not translate the OGR spatial filter as SQLite R-Tree filter. QGIS could now open the database and progressively displays the features. Unfortunately when zooming in, the UI became frozen again. When applying a spatial filter, the GeoPackage driver created a SQL request like the following one:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;">SELECT * FROM the_table WHERE fid IN </span></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;"> (SELECT id FROM the_rtree WHERE </span></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;"> xmin <= bbox_xmax AND xmax >= bbox_xmin AND</span></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;"> ymin <= bboy_ymay AND ymay >= bboy_ymin)</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It turns out that the sub-select (the one that fetches the feature ID from the spatial index) is apparently entirely run before the outer select (the one that returns geometry and attributes) starts being evaluated. This way of expressing the spatial filter came from the Spatialite driver (since GeoPackage and Spatialite use the exact same mechanisms for spatial indexing), itself based on examples from an old <a href="https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8">Spatialite tutorial</a>. For not too big databases, this runs well. After some experiment, it turns out that doing a JOIN between the feature table and the RTree virtual table makes it possible to have a non blocking request:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;">SELECT * FROM the_table t JOIN the_rtree r ON t.fid = r.id</span></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;">WHERE r.xmin <= bbox_xmax AND r.xmax >= bbox_xmin AND</span></div>
<div style="text-align: justify;">
<span style="font-family: "Courier New",Courier,monospace;"> r.ymin <= bboy_ymax AND r.ymax >= bboy_ymin</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Now QGIS is completely responsive, although I find that even on high zoom levels, the performance is somehow disappointing, ie features appear rather slowly. There seems to be some threshold effect on the size of the database, since the performance is rather good on the Italian province cadastral use case.<br /><br />Another experiment showed that increasing the SQLite page size from 1024 bytes (the default in SQLite 3.11 or earlier) to 4096 bytes (<a href="https://www.sqlite.org/pgszchng2016.html)">the default since SQLite 3.12</a>) decreases the database size to 28.8 GB. This new page size of 4096 bytes is now used by default by the OGR SQLite and GPKG drivers (unless OGR_SQLITE_PRAGMA=page_size=xxxx is specified as a configuration option).<br /><br />I also discovered that increasing the SQLite page cache from its 2 MB default to 2 GB (with --config OGR_SQLITE_CACHE 2000) significantly improved the time to build the spatial index, decreasing the total conversion time from 4h30 to 2h10. 2GB is just a value selected at random. It might be too large or perhaps a larger value would help further.<br /><br />All improvements mentionned above (faster spatial index creation, better use of spatial index and change of default page size) are now in GDAL trunk, and will be available in the upcoming 2.2.0 release.</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com3tag:blogger.com,1999:blog-7797431205725571762.post-53628252277893690812016-09-19T03:30:00.000-07:002016-09-19T03:52:31.583-07:00Running FreeBSD in Travis-CI<div style="text-align: justify;">
Note for geospatial focused readers: this article has little to do with geo, although it is applied to GDAL, but more with software virtualization, hacks, software archeology and the value of free software. Note for virtualization experts: I'm not one, so please bear with my approximate language and inaccuracies.</div>
<br />
<div style="text-align: justify;">
Travis-CI is a popular <span class="st"> continuous integration platform, that can be easily used with software projects hosted at GitHub. Travis-CI has a free offer for software having public repository at GitHub. Travis-CI provides cloud instances running Linux or Mac OS X. To increase portability tests of GDAL, I wondered if it was somehow possible to run another operating system with Travis-CI, for example FreeBSD. A search lead me to this <a href="https://github.com/travis-ci/travis-ci/issues/1818">question</a> in their bug tracker but the outcome seems to be that it is not possible, nor in their medium or long term plans.</span></div>
<span class="st"><br /></span>
<br />
<div style="text-align: justify;">
<span class="st">One idea that came quickly to mind was to use the <a href="http://wiki.qemu.org/Main_Page">QEMU</a> machine emulator that can simulate full machines (CPU, peripherals, memory, etc), of several hardware architectures (Intel x86, ARM, MIPS, SPARC, PowerPC, etc..). To run QEMU, you mostly need to have a virtual hard drive, i.e. a file that replicates the content of the hard disk of the virtual machine you want to run. I found <a href="https://www.stacklet.com/downloads/KVM-Disk-Image-FreeBSD-9.2-Lightweight-x86_64">here</a> a small ready-to-use x86_64 image of FreeBSD 9.2, with one nice property: the ssh server and DHCP are automatically started, making it possible to remote connect to it.</span></div>
<div style="text-align: justify;">
<span class="st"><br /></span></div>
<div style="text-align: justify;">
<span class="st">So starting with a Travis-CI Ubuntu Trusty (14.04) image, here are the step to launch our FreeBSD guest:</span></div>
<div style="text-align: justify;">
</div>
<br />
<pre class="ansi" id="log"><span id="1-595">sudo apt-get install qemu</span><span id="1-595"><span id="1-913">
wget ftp://ftp.stacklet.com/archive/x86-64/FreeBSD/9.2/\
freebsd.9-2.x86-64.20140103.raw.img.txz</span></span><span id="1-595"><span id="1-913"><span id="1-1012">
tar xJvf freebsd.9-2.x86-64.20140103.raw.img.txz</span></span></span><span id="1-595"><span id="1-913"><span id="1-1012"><span id="1-1019">
qemu-system-x86_64 -daemonize -display none \</span></span></span></span><span id="1-595"><span id="1-913"><span id="1-1012"><span id="1-1019">
freebsd.9-2.x86-64.20140103.raw.img \</span></span></span></span><span id="1-595"><span id="1-913"><span id="1-1012"><span id="1-1019">
-m 1536 -smp 4 -net user,hostfwd=tcp::10022-:22 -net nic</span></span></span></span></pre>
<pre class="ansi" id="log"><span id="1-595"><span id="1-913"><span id="1-1012"><span id="1-1019">
</span></span></span></span></pre>
The qemu invokation starts the virtual machine as a daemon without display, turn on networking and asks for the guest (ie FreeBSD) TCP port 22 (the ssh port) to be accessible by the host (Linux Trusty) as port 10022<br />
<br />
<div style="text-align: justify;">
To ssh into the VM, there's one slight inconvenience: ssh login requires a password. The root password for this VM is "password". But ssh is secured and doesn't accept the password to be provided through files or piped in with "echo". I found that the <a href="https://sourceforge.net/projects/sshpass/">sshpass</a> utility was designed to overcome this in situations where security isn't really what matters. However, the version of sshpass bundled with Ubuntu Trusty didn't work with the corresponding ssh version (not surprisingly since the authors of sshpass mention that it is full of assumptions about how ssh works, that can be easily breaks with changes of ssh). I found that the latest version 1.0.6 worked however.</div>
<div style="text-align: justify;">
<br />
With 4 extra lines, we can now login into our FreeBSD instance:<br />
<br />
<br /></div>
<div style="text-align: justify;">
<pre class="ansi" id="log"><span id="1-1025">wget http://fossies.org/linux/privat/sshpass-1.06.tar.gz</span><span id="1-1045">
tar xzf sshpass-1.06.tar.gz</span><span id="1-1051">
cd sshpass-1.06 && ./configure && make -j3 && cd ..</span><span id="1-1051"><span id="1-1147">
export MYSSH="sshpass-1.06/sshpass -p password ssh \
-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
root@localhost -p10022"</span> </span></pre>
</div>
<br />
So now we can configure a bit our FreeBSD VM to install with the 'pkg' package manager a few dependencies to build GDAL:<br />
<br />
<pre class="ansi" id="log"><span id="1-1159">$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg bootstrap'</span><span id="1-1159"><span id="1-1169">
$MYSSH 'mkdir /etc/pkg'</span></span><span id="1-1159"><span id="1-1169"><span id="1-1176">
sshpass-1.06/sshpass -p password scp \
-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \</span></span></span><span id="1-1159"><span id="1-1169"><span id="1-1176">
-P 10022 FreeBSD.conf root@localhost:/etc/pkg/FreeBSD.conf</span></span></span><span id="1-1159"><span id="1-1169"><span id="1-1176"><span id="1-1194">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg update'</span></span></span></span><span id="1-1159"><span id="1-1169"><span id="1-1176"><span id="1-1194"><span id="1-1206">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install gmake'</span></span></span></span></span><span id="1-1242">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install python27'</span><span id="1-1242"><span id="1-1281">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install py27-numpy'</span></span><span id="1-1242"><span id="1-1281"><span id="1-1372">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install sqlite3 curl'</span></span></span><span id="1-1242"><span id="1-1281"><span id="1-1372"><span id="1-1429">
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install expat'</span></span></span></span></pre>
<pre class="ansi" id="log"><span id="1-1242"><span id="1-1281"><span id="1-1372"><span id="1-1429"> </span> </span> </span> </span></pre>
<div style="text-align: justify;">
Here we go: ./configure && make ! That works, but 50 minutes later (the maximum length of a Travis-CI job), our <a href="https://travis-ci.org/rouault/gdal_coverage/builds/160635506">job is killed</a> with perhaps only 10% of the GDAL code base being compiled. The reason is that we used the pure software emulation mode of QEMU that involves on-the-fly disassembling of the code to be run and re-assembling. QEMU can for example emulate a ARM guest on a Intel host, and vice-versa, and there's no really shortcuts when the guest and host architectures are the same. So your guest can typically run 10 times slower than it would on a real machine with its native architecture. Actually, that's not true, since with the addition of CPU instructions dedicated to virtualization (VT-x for Intel, AMD-V for AMD), an hypervisor called <a href="http://www.linux-kvm.org/page/Main_Page">KVM</a> (Kernel Virtual Machine) was added to the Linux kernel, and QEMU can use KVM to implement the above mentioned shortcuts to reach near bare-metal performance. It just takes to use 'kvm' instead of 'qemu-system-x86_64'. <a href="https://travis-ci.org/rouault/gdal_coverage/builds/160579884">Let's do that</a> ! Sigh, our attempt fails miserably with a "failed to initialize KVM" error message. If we display the content of /proc/cpuinfo, we get:</div>
<div style="text-align: justify;">
<br /></div>
<pre class="ansi" id="log"><span id="1-698">flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1</span><span id="1-698">
sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm
fsgsbase bmi1 avx2 smep bmi2 erms xsaveopt</span></pre>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A lot of nice to have things, but the important thing to notice is the absence of the 'vmx' (Intel virtualization instruction set) and 'svm' (similar for AMD) flags. So this machine has no hardware virtualization capabilities ! Or more precisely, this *virtual* machine has no such capabilities. The <a href="https://docs.travis-ci.com/user/trusty-ci-environment/">documentation of the Trusty Travis-CI environment</a> mentionned they are based on Google Computing Engine as the hypervisor, and <a href="http://stackoverflow.com/questions/26722603/can-kvm-be-used-inside-a-gce-instance">apparently it does not allow</a> (or is not configured to allow) nested virtualization, despite <a href="https://cloud.google.com/compute/docs/faq">GCE being based on KVM</a>, and <a href="https://www.kernel.org/doc/Documentation/virtual/kvm/nested-vmx.txt">KVM potentially allowing nested virtualization</a>. GCE allows Docker to run inside VM, but Docker only runs Linux "guests". So it seems we are really stuck.</div>
<br />
<div style="text-align: justify;">
Here comes the time for good old memories and a bit of software archeology. QEMU was started by <a href="http://bellard.org/">Fabrice Bellard</a>. If you didn't know his name yet, F. Bellard created FFMPEG and QEMU, holds a world record for the number of decimals of Pi computed on a COTS PC, has ported QEMU in JavaScript to run the <a href="http://bellard.org/jslinux/">Linux kernel in your browser</a>, devised BPG, a new compression based on HEVC, etc....</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
At the time where his interest was focused on QEMU, he created <a href="http://wiki.qemu.org/KQemu/Doc">KQemu</a>, a kernel module (for Linux, Windows, FreeBSD hosts), that could significantly enhance QEMU performance when the guest and hosts are x86/x86_64. KQemu requires QEMU to be modified to communicate with the kernel module
(similarly to the working of QEMU with the KVM kernel module). KQemu started as a closed source project and was eventually released as GPL v2. One of the key feature of KQemu is that it does not require (nor use) hardware virtualization instructions. KQemu software virtualization involves complicated tricks, particularly for code in the guest that run in "Ring 0", ie with the highest priviledges, that you must patch to run as Ring 3 (non-priviledge) code in the host. You can get an idea of what is involved by reading the <a href="https://www.virtualbox.org/manual/ch10.html#idm9771">documentation of VirtualBox regarding software virtualization</a>. I will not pretend that QEMU+KQemu did the exact same tricks as VirtualBox, but that should give you at least a picture of the challenges involved. This complexity is what lead to KQemu to be eventually abandonned when
CPUs with hardware virtualization became widespread to the market since KVM based virtualization is much cleaner to implement. Starting with QEMU 0.12.0, KQemu support was finally dropped from QEMU code base.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Due to KQemu not using hardware virtualization instructions, there is a good hope that it can run inside a virtualized environment. So let's have a try with QEMU 0.11.1 and KQemu 1.4.0pre. Compiling QEMU 0.11.1 on Ubuntu Trusty runs quite well, except a linking error easily fixed with this trivial <a href="https://github.com/rouault/gdal_coverage/blob/freebsd9.2/qemu.patch">patch</a>. Building KQemu is a bit more involved, being a kernel module and the (internal) Linux kernel API being prone to changes from time to time. One good news is that the Linux specific part of kqemu is a relatively small file and the API breaks were limited to 2 aspects. The way to get the memory management structure of the current task had changed in Linux 2.6.23 and I found this simple <a href="https://crux.nu/ports/crux-2.4/opt/kqemu/kqemu.patch">patch</a> to solve it. Another change that occured in a later Linux release is the removal of kernel semaphores to be replaced by mutexes. My cumulated patch to fix all compilation issues is <a href="https://github.com/rouault/gdal_coverage/blob/freebsd9.2/kqemu.patch">here</a>. I don't pretend that it is technically correct as my knowledge of kernel internals is more than limited, but a local test seemed to confirm that adding -enable-kqemu to the qemu command line worked sufficiently well to start and do things in the FreeBSD VM, and at a very decent speed. I tried the -kernel-qemu switch that turns on KQemu acceleration for kernel guest code, but that resulted in a crash of qemu near the end of the boot process of FreeBSD. Which is not surprising as kernel-qemu makes some assumptions on the internal working of the guest OS, which perhaps FreeBSD does not meet. Or perhaps this is just a bug of qemu/kqemu.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Running it on Travis-CI was successful too, with the compilation being done in 20 minutes, so probably half of the speed of bare metal, which is good enough. kqemu does not support SMP guests (but this was listed in the potential "roadmap", so probably achievable), but if we wanted to speed up compilation, we could potentially launch 2 kqemu-enabled qemu instances (the Travis-CI VM have 2 cores available) that would compile different parts of the software with the build tree being hosted in a NFS share. I said that compilation goes fine, except that the build process (actually the qemu instance) crashes at libgdal.so building time (I can also reproduce that locally). This is probably because the history of qemu & kqemu wasn't long enough to go from beta quality to production quality. I've workarounded this issue by only doing the compilation in -enable-kqemu mode, restarting the VM in pure software emulation to do the linking, and then restarting in -enable-kqemu mode. Unfortunately running the GDAL Python autotest suite in kqemu mode also leads to a qemu crash (due to the design of kqemu only runnnig code in ring 3, crashes do not affect the host), and running it completely in pure emulation mode reaches the 50 minute time-out, so for the sake of this demonstration, I only run one of the test file. And now we have our first <a href="https://travis-ci.org/rouault/gdal_coverage/builds/160959790">succesful build</a> given this <a href="https://github.com/rouault/gdal_coverage/blob/freebsd9.2/.travis.yml">build recipee</a>.</div>
<br />
<div style="text-align: justify;">
I could also have potentially tried VirtualBox because, as mentionned above, it supports software virtualization with acceleration. But that is only for 32 bit guests (and I didn't find a ready-made FreeBSD 32bit image that you can directly ssh into). For 64 bit guests, VirtualBox require hardware virtualization to be available in the host. To the best of my knowledge, KQemu is (was) the only solution to enable acceleration of 64 bit guests without hardware requirements.</div>
<br />
<div style="text-align: justify;">
My main conclusion of this experiment is it is a striking example of a key advantage of the open source model. If kqemu had not been released as GPL v2, I would have never been able to resurrect it and modify it to run on newer kernels (actually there was also <a href="http://savannah.nongnu.org/projects/qvm86/">QVM86</a>, an attempt of developing an alternative to Kqemu while Kqemu was still closed source and that was abandonned when VirtualBox was open sourced).</div>
<div style="text-align: justify;">
<br /></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-37119985188755823892016-07-19T04:25:00.002-07:002016-07-19T04:25:18.859-07:00Speeding up computation of raster statistics using SSE-2/AVX-2<div style="text-align: justify;">
GDAL offers a method ComputeStatistics() that given a raster band returns the minimum and maximum values of pixels, the mean value and the standard deviation.<br />
<br />
For those not remembering how to compute mean and standard deviations, the basic formulas for values indexed from 0 to N-1 are :</div>
<blockquote class="tr_bq">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">mean = sum(value(i) for i = 0 to N-1) / N</span></span></blockquote>
<blockquote>
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = square root of the mean of the square of the differences of values to the mean<br />std_dev = sqrt(sum(i = 0 to N-1, (value(i) - mean)^2)) / N)</span></span></blockquote>
<div style="text-align: justify;">
A very naive version would first compute the mean, and in a second pass compute the standard deviation.</div>
<div style="text-align: justify;">
<br /></div>
But it can be easily proven (by expanding the <span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">(value(i) - mean)^2</span></span> term),that it is also equivalent to :<br />
<blockquote class="tr_bq">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(sum(i = 0 to N-1, value(i)^2)/N - mean^2)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(mean_of_square_values - square_of_mean)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(sum(i = 0 to N-1, value(i)^2)/N - (sum_of_values/N)^2)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(N^2 *(sum(i = 0 to N-1, value(i)^2)/N - (sum_of_values/N)^2)) / N</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(N * sum_of_square_values - sum_of_values^2) / N</span></span></blockquote>
<div style="text-align: justify;">
A less naive implementation would compute the sum of values and the sum of square values in a single pass. However the standard deviation computed like that might be subject to numeric instability given that even if the result is small, sum_of_square_values and sum_of_values can be very big for a big number of pixels, and thus if represented with floating point numbers, the difference between both terms can be wrong.</div>
<br />
<h4>
Welford algorithm</h4>
<div style="text-align: justify;">
So in recent GDAL versions, the computation of the mean and standard deviation is done in a progressive and numerically stable way, thanks to the <a href="https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm">Welford algorithm</a></div>
<br />
The generic code is:<br />
<blockquote>
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">pixel_counter = 0</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">mean = 0</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">M2 = 0</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">foreach(value in all pixels):</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> if value < minimum or pixel_counter == 0: minimum = value</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> if value > maximum or pixel_counter == 0: maximum = value</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> pixel_counter = pixel_counter + 1</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> delta = value - mean</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> mean = mean + delta / pixel_counter</span></span><br />
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;"> M2 = M2 + delta * (value - mean);</span></span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt( M2 / pixel_counter )</span></span></blockquote>
<br />
<h4>
Proof of Welford algorithm</h4>
(You can skip this paragraph and still follow the rest of this article)<br />
<br />
The magic of Welford algorithm lies in the following recurrence relations.<br />
<br />
For the mean, it is rather obvious :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">N*mean(N) = sum(i = 0 to N-1, value(i))<br />N*mean(N) = sum(i = 0 to N-2, value(i)) + value(N-1)<br />N*mean(N) = (N-1) * mean(N-1) + value(N-1)<br />mean(N) = (N-1)/N * mean(N-1) + value(N-1)/N<br />mean(N) = mean(N-1) + (value(N-1) - mean(N-1)) / N</span></span><br />
Hence mean = mean + delta / pixel_counter<br />
<br />
For the standard deviation, the proof is a little bit more lengthy :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">N*stddev(N)^2 = sum(i=0 to N-1, (value(i) - mean(N))^2 )<br /><br />N*stddev(N)^2 = sum(i=0 to N-1, (value(i) - (mean(N-1) + (value(N-1) - mean(N-1)) / N))^2 )<br /><br />N*stddev(N)^2 = sum(i=0 to N-1, ((value(i) - mean(N-1)) - ((value(N-1) - mean(N-1)) / N))^2 )<br /><br />N*stddev(N)^2 = sum(i=0 to N-1, (value(i) - mean(N-1))^2 + ((value(N-1) - mean(N-1)) / N)^2<br /> - 2 * (value(i) - mean(N-1))*((value(N-1) - mean(N-1)) / N) )<br /><br />N*stddev(N)^2 = sum(i=0 to N-1, (value(i) - mean(N-1))^2) + N * ((value(N-1) - mean(N-1)) / N)^2<br /> - 2 * sum(i=0 to N-1, (value(i) - mean(N-1)))*((value(N-1) - mean(N-1)) / N))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1)) ^2<br /> + N * ((value(N-1) - mean(N-1)) / N)^2<br /> - 2 * sum(i=0 to N-1, (value(i) - mean(N-1)))*((value(N-1) - mean(N-1)) / N))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1))^2 * (1 + 1 / N)<br /> - 2 * N( mean(N) - mean(N-1)) *((value(N-1) - mean(N-1)) / N))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1)) *<br /> ((1 + 1 / N) * (value(N-1) - mean(N-1)) - 2 * N( mean(N) - mean(N-1)) / N))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1)) *<br /> ((value(N-1) - mean(N-1) + (value(N-1) - mean(N-1) / N - 2 * N( mean(N) - mean(N-1)) / N))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1)) *<br /> ((value(N-1) - mean(N-1) - (mean(N) - mean(N-1))))<br /><br />N*stddev(N)^2 = (N-1)*stddev(N-1)^2 + (value(N-1) - mean(N-1)) * (value(N-1) - mean(N))</span></span><br />
<br />
Hence M2 = M2 + delta * (value - mean)<br />
<br />
<h4>
Integer based computation of standard deviation</h4>
<div style="text-align: justify;">
The Welford algorithm is good but it involves floating point operations for each pixel to compute the progressive mean and variance. Whereas fundamentally we would need those floating point operations only at the end if using the original formulas, and we could use integer arithmetics for the rest. Another drawback of Welford approach is that it prevents any direct parallelization (there might still be ways to reconcile partial computations, but I have not explored those), whereas if you have a set of pixels, you can conceptually divide it in as many subsets you want, and for each subset compute its local minimum, maximum, sum of values and sum of square values. Merging subsets is then trivial: take the minimum of minimums, maximum of maximums, sums of sum of values and sums of sum of square values.</div>
<br />
<div style="text-align: justify;">
Let us consider the case of pixels whose type is unsigned byte, ie with values in the range [0,255]. We want to compute</div>
<blockquote class="tr_bq">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">std_dev = sqrt(N * sum_of_square_values - sum_of_values^2) / N</span></span></blockquote>
<div style="text-align: justify;">
For practical reasons, we want N, sum_of_square_values and sum_of_values to fit on a 64bit unsigned integer (uint64), which is the largest natural integral type that can be easily and efficiently used on today's CPUs. The most limiting factor will be sum_of_square_values. Given that in the worse case, a square value is equal to 255*255, the maximum number of pixels N we can address is (2^64-1) / (255*255) = 283 686 952 306 183, which is large enough to represent a raster of 16 million pixels x 16 million pixels. Good enough.</div>
<br />
<div style="text-align: justify;">
We know need to be able to multiply two uint64 values and get the result as a uint128, and compute the difference of two uint128 values. The multiplication on Intel/AMD CPUs in 64bit mode natively yields to a 128 bit wide result. It is just that there is no standardized way in C/C++ how to get that result. For GCC compiler in 64 bit mode, the __uint128_t type can be used in a transparent way</div>
to do that :<br />
<blockquote class="tr_bq">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">__uint128_t result = (__uint128_t)operand_64bit * other_operand_64bit</span></span></blockquote>
<div style="text-align: justify;">
For Visual Studio compilers in 64 bit mode, a special instruction _umul128() is available.</div>
<br />
<div style="text-align: justify;">
What about non-Intel or non-64bit CPUs ? In that case, we have to do the multiplication at hand by decomposing each uint64 values into its lower uint32 and uint32 parts, doing 4 uint32*uint32->uint64 multiplications, summing the intermediary results, handling the carries and building the resulting number. Not very efficient, but we do not really care about that, since it is just a final operation.</div>
<br />
<div style="text-align: justify;">
To make it is easier, that partial 128 bit arithmetics is abstracted in a <a href="https://github.com/OSGeo/gdal/blob/c905203b4b4a745d4f63a6738c17713bf8c81f95/gdal/gcore/gdalrasterband.cpp#L3589">GDALUInt128 C++ class</a> that has different implementations regarding the CPU and compiler support.</div>
<br />
<div style="text-align: justify;">
Now that we have solved the final part of the computation, we can then write</div>
the computation loop as following :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> minimum = maximum = value[0]<br /> foreach value:<br /> if value < minimum: minimum = value<br /> else if value > maximum: maximum = value<br /> sum = sum + value<br /> sum_square = sum_square + value * value</span></span><br />
<br />
Can we do better ? A bit of loop unrolling can help :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> minimum = maximum = value[0]<br /> foreach value pair (value1, value2):<br /> if value1 < minimum: minimum = value1<br /> else if value1 > maximum: maximum = value1<br /> sum = sum + value1<br /> sum_square = sum_square + value1 * value1<br /> if value < minimum: minimum = value2<br /> else if value > maximum: maximum = value2<br /> sum = sum + value2<br /> sum_square = sum_square + value2 * value2<br /> (deal with potential remaining pixel if odd number of pixels)</span></span><br />
<br />
<div style="text-align: justify;">
If we start with comparing value1 and value2, we can actually save a comparison (resulting in 3 comparisons for each pair of pixel, instead of 4) :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> minimum = maximum = value[0]<br /> foreach value pair (value1, value2):<br /> if value1 < value2:<br /> if value1 < minimum: minimum = value1<br /> if value2 > maximum: maximum = value2<br /> else:<br /> if value2 < minimum: minimum = value2<br /> if value1 > maximum: maximum = value1<br /> sum = sum + value1<br /> sum_square = sum_square + value1 * value1<br /> sum = sum + value2<br /> sum_square = sum_square + value2 * value2<br /> (deal with potential remaining pixel if odd number of pixels)</span></span><br />
<br />
This improvement can already dramatically reduce the computation time from<br />
<div style="text-align: justify;">
1m10 to 7s, to compute 50 times the statistics on a 10000 x 10000 pixel raster.</div>
<br />
<h4>
Parallelization with SSE2</h4>
<div style="text-align: justify;">
We have not yet explored the parallelization of the algorithm. One way to do it would be to use multi-threading, but for Intel-compatible CPU, we can also explore the capabilities of the SIMD (Single Instruction/Multiple Data) instruction set. On 64bit Intel, the <a href="https://en.wikipedia.org/wiki/SSE2">SSE2</a> instruction set, which offers vectorized operations on integers, is guaranteed to be always present. 16 registers (XMM0 to XMM15) are available, each 128 bit wide.</div>
<br />
<div style="text-align: justify;">
So each register is wide enough to hold 16 packed int8/uint8, 8 packed int16/uint16, 4 packed int32/uint32 or 2 packed int64/uint64, depending on the wished representation. A diverse set of operations are offered and generally operate on the sub-parts of each register independently. For example c=_mm_add_epi8(a,b) will add independently c[i]=a[i]+b[i] for i=0 to 15, and that in just one CPU cycle._mm_add_epi16() will work on packed uint16, etc. To add some salt, not all operators are available for all elementary subtypes however.</div>
<br />
<div style="text-align: justify;">
Compilers are supposed to be able to automatically vectorize some C code, but in practice they rarely manage to do so for real world code, hence requiring the programmer to use the SIMD instruction set at hand. All major compilers (gcc, clang, Visual Studio C/C++) offer access to the SSE2 instruction set through "intrinsics", which are C inline functions that wrap the corresponding assembly instructions, but while still being C/C++. This allows the compiler to do the register allocation and various other optimizations (such as re-ordering), which is a huge win over coding directly in assembly. The <a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#">Intel intrinsics guide</a> is a useful resource to find the appropriate intrinsics.</div>
<br />
<div style="text-align: justify;">
So a temptative vectorized version of our algorithm would be :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_minimum = vector_of_16_bytes[0]<br /> v_maximum = vector_of_16_bytes[0]<br /> v_sum = vector_of_16_zeros<br /> v_sum_square = vector_of_16_zeros<br /><br /> foreach vector_of_16_bytes v:<br /> v_minimum = vector_minimum(v_minimum, v)<br /> v_maximum = vector_maximum(v_maximum, v)<br /> v_sum = vector_add(v_sum, v)<br /> v_sum_square = vector_sum(v_sum_square, vector_mul(v, v))<br /><br /> minimum = minimum_of_16_values(v_minimum)<br /> maximum = maximum_of_16_values(v_minimum)<br /> sum = sum_of_X??_values(v_sum)<br /> sum_square = sum_of_X??_values(v_sum_square)<br /> (deal with potential remaining pixels if number of pixels is not multiple of 16)</span></span><br />
<br />
<div style="text-align: justify;">
vector_minimum and vector_maximum do exist as _mm_min_epu8 and _mm_max_epu8. But for vector_add, which variant to use _mm_add_epi8, _mm_add_epi16, _mm_add_epi32 or _mm_add_epi64 ? Well, none directly. We want to add uint8 values, but the result cannot fit on a uint8 (255+255=510). The same holds for sum_square. The result of each square multiplication requires at least a uint16, and we want to loop several times, so we need at least a width of uint32 to hold the accumulation. We designed the overall algorithm to be able to handle an accumulator of uint64, but this would decrease the performance of the vectorization if using that in the tigher loop. So we will decompose our loop into one upper loop and and one inner loop. The inner loop will do as many iterations as possible, while still not overflowing a uint32 accumulator. So (2^32-1)/(255*255) = 66051.xxxx iterations. Which we round down to the closest multiple of 16.</div>
<br />
So what about v_sum = vector_add(v_sum, v) ?<br />
<div style="text-align: justify;">
The first idea would be to extract the 4 lowest order bytes of v, unpack them so that they fit each on a uint32 and then use _mm_add_epi32 to add them in the v_sum accumulator.</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_sum = _mm_add_epi32(v_sum, _mm_unpacklo_epi16(_mm_unpacklo_epi8(v, zero), zero)</span></span><br />
<div style="text-align: justify;">
_mm_unpacklo_epi8(v, zero) expands the 8 lowest order bytes of v as 8 uint16. And similarly _mm_unpacklo_epi16(v, zero) expands the 4 lowest order uint16 of v as 4 uint32.</div>
<br />
And then repeat that with the 3 other groups of 4 bytes :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_sum = _mm_add_epi32(v_sum, _mm_unpacklo_epi16(_mm_unpacklo_epi8(_mm_shuffle_epi32(v, 1), zero), zero)<br /> v_sum = _mm_add_epi32(v_sum, _mm_unpacklo_epi16(_mm_unpacklo_epi8(_mm_shuffle_epi32(v, 2), zero), zero)<br /> v_sum = _mm_add_epi32(v_sum, _mm_unpacklo_epi16(_mm_unpacklo_epi8(_mm_shuffle_epi32(v, 3), zero), zero)</span></span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
But we can do better thans to the <a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=sad&techs=SSE2">_mm_sad_epu8</a> intrinsics. It is designed to "compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst." If we notice that ABS(x-0) = x when x >= 0, then it does what we want.</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_sum = _mm_add_epi64(v_sum, _mm_sad_epu8(v, zero))</span></span><br />
<br />
<div style="text-align: justify;">
Pedantic note: we can actually use _mm_add_epi32, since there is no risk of overflow : 8 * 66051 * 255 fits on a uint32. The advantage of using _mm_add_epi32 is that as we will use it elsewhere, the compiler can re-order additions to group them in pairs and benefit from their 0.5 throughput.</div>
<br />
<div style="text-align: justify;">
_mm_sad_epu8() has a relatively high latency (5 cycles), but it is still a big win since it replaces 14 intrinsics of our initial version.</div>
<br />
<div style="text-align: justify;">
What about the computation of the square value ? There is no mnemonics to directly multiply packed bytes and get the resulting packed uint16 (or even better uint32, since that is the type we want to operate on eventually to be able to do several iterations of our loop!). One approach would be to take the 8 lowest order bytes, un-pack them to uint16, use the _mm_mullo_epi16() intrinsics that does uint16 x uint16->uint16. Then you would take the 4 lowest order uint16 of this intermediate result, un-pack them to uint32 and finally use _mm_add_epi32 to accumulate them in v_sum_square.</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_low = _mm_unpacklo_epi8(v, zero)<br /> v_low_square = _mm_mullo_epi16(v_low, v_low)<br /> v_sum_square = _mm_add_epi32(v_sum_square, _mm_unpacklo_epi16(v_low_square, zero)</span></span><br />
<br />
<div style="text-align: justify;">
Then repeat the operation with the 4 upper order uint16 of the intermediate result.</div>
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"><br /> v_sum_square = _mm_add_epi32(v_sum_square,<br /> _mm_unpacklo_epi16(_mm_shuffle_epi32(v_low_square, 2 | (3 <<2)), zero) )</span></span><br />
<br />
<div style="text-align: justify;">
_mm_shuffle_epi32(v, 2 | (3 <<2) is a trick to replicate the high 64 bits of a XMM register into its low 64 bits. We don't care about the values of the resulting high 64 bits since they will be lost with the later unpack operations.</div>
<br />
And then repeat the whole process with the 8 highest order bytes.<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"><br /> v_high = _mm_unpacklo_epi8(_mm_shuffle_epi32(v, 2 | (3 <<2)), zero)<br /> v_high_square = _mm_mullo_epi16(v_high, v_high)<br /> v_sum_square = _mm_add_epi32(v_sum_square, _mm_unpacklo_epi16(v_high_square, zero)<br /> v_sum_square = _mm_add_epi32(v_sum_square,<br /> _mm_unpacklo_epi16(_mm_shuffle_epi32(v_high_square, 2 | (3 <<2)), zero) )</span></span><br />
<br />
<div style="text-align: justify;">
We can actually do much better with the <a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=madd&techs=SSE2">_mm_madd_epi16()</a> mnemonics that "Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results". This is really close to what we need. We just need to prepare uint16/int16 integers (the sign convention here does not matter since a uint8 zero-extended to 16 bit is both a uint16/int16)</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v_low_16bit = _mm_unpacklo_epi8(v, zero)<br /> v_sum_square = _mm_add_epi32(v_sum_square, _mm_madd_epi16(v_low_16bit, v_low_16bit))<br /> v_high_16bit = _mm_unpacklo_epi8(_mm_shuffle_epi32(v, 2 | (3 <<2)), zero)<br /> v_sum_square = _mm_add_epi32(v_sum_square, _mm_madd_epi16(v_high_16bit, v_high_16bit))</span></span><br />
<br />
<div style="text-align: justify;">
The latencies and throughput of _mm_mullo_epi16 and _mm_madd_epi16 are the same, so the second version is clearly a big win.</div>
<br />
<h4>
Use of AVX2</h4>
<div style="text-align: justify;">
We can tweak performance a bit by doing a 2x loop unrolling, which will enable the compiler to re-order some operations so that those who have a throughput of 0.5 cycle to be consecutive (such as _mm_add_epi32, _mm_unpacklo_epi8) and thus be able to executive 2 of them in a single cycle. When doing so, we can notice that we are operating on a virtual 256 bit register. But 256 bit registers do exist in the AVX2 instruction set, that was introduced in relatively recent hardware (2013 for Intel Haswell). AVX/AVX2 offer the YMM registers, equivalent of XMM registers but on a doubled bit width (the 128 bit low part of a YMM register is its corresponding XMM register). One particularity of the YMM register is that it operates on quite distinct "128 bit lanes", but you can still extract each lane.</div>
<br />
The port to AVX2 is quite straightforward :<br />
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> v = _mm256_load_si256(data + i)<br /> v_sum = _mm256_add_epi32(v_sum, _mm256_sad_epu8(v, zero))<br /> v_low_16bit = _mm256_cvtepu8_epi16(_mm256_extracti128_si256(v, 0));<br /> v_sum_square = _mm256_add_epi32(v_sum_square, _mm256_madd_epi16(v_low_16bit, v_low_16bit))<br /> v_high_16bit = _mm256_cvtepu8_epi16(_mm256_extracti128_si256(v, 1));<br /> v_sum_square = _mm_add_epi32(v_sum_square, _mm_madd_epi16(v_high_16bit, v_high_16bit))</span></span><br />
<br />
_mm256_extracti128_si256(v,0) extracts the 128 bit lower part of the register,<br />
<div style="text-align: justify;">
and _mm256_extracti128_si256(v,1) the 128 bit upper part.</div>
<br />
<div style="text-align: justify;">
The good news is that we can have a single code base for the SSE2 and AVX2 variants, by using the AVX2 code. In the case of SSE2, we in fact define the _mm256 functions with their corresponding _mm 128bit functions operating on the low and high 128 bit parts. For example:</div>
<br />
<blockquote class="tr_bq">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">static inline GDALm256i GDALmm256_add_epi32(GDALm256i r1, GDALm256i r2)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">{</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> GDALm256i reg;</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> reg.low = _mm_add_epi32(r1.low, r2.low);</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> reg.high = _mm_add_epi32(r1.high, r2.high);</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> return reg;</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">}</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"></span></span></blockquote>
<br />
The AVX2-with-SSE2 emulation can be found in :<br />
<a href="https://github.com/OSGeo/gdal/blob/trunk/gdal/gcore/gdal_avx2_emulation.hpp">https://github.com/OSGeo/gdal/blob/trunk/gdal/gcore/gdal_avx2_emulation.hpp</a><br />
<br />
Thanks to inlining and usual compiler optimizations, this will be equivalent to our hand 2x unrolled version ! The final code is <a href="https://github.com/OSGeo/gdal/blob/c905203b4b4a745d4f63a6738c17713bf8c81f95/gdal/gcore/gdalrasterband.cpp#L4030">here</a>.<br />
<br />
<div style="text-align: justify;">
Regarding timings, our SSE2-emulated AVX2 version runs in 1.6s, so roughly a 4x time improvement with respect to the portable optimized C version. On a hardware capable of AVX2, the pure AVX2 version is 15% faster than the SSE2-emulated version. So definitely not enough to justify a dedicated code path, but here as we have a unified one, it comes almost for free. Provided that the code is explicitly compiled to enable AVX2.</div>
<br />
<h4>
Nodata values</h4>
<div style="text-align: justify;">
Up to now, we have ignored the potential existence of nodata values. When computing statistics, we do not want pixels that match the nodata value to be taken into account in the minimum, maximum, mean or standard deviation.</div>
<br />
<div style="text-align: justify;">
In the pure C approach, this is easy. Just ignore pixels that match the nodata value:</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> minimum = maximum = value[0]<br /> foreach value:<br /> if value != nodata:<br /> valid_pixels = valid_pixels + 1<br /> minimum = min(minimum, value)<br /> maximum = max(minimum, value)<br /> sum = sum + value<br /> sum_square = sum_square + value * value</span></span><br />
<br />
<div style="text-align: justify;">
We cannot directly translate that with SSE2/AVX2 mnemonics since the result of the value != nodata test can be different for each of the 32 packed bytes of the (virtual) AVX2 register, and making tests for each components of the vector register would kill performance to a point where it would be worse than the pure C approach !</div>
<br />
<div style="text-align: justify;">
We can however rewrite the above in a vector friendly way with :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> minimum = maximum = first value that is not nodata<br /> neutral_value = minimum (or any value in final [min,max] range that is not nodata)<br /> foreach value:<br /> validity_flag = if (value != nodata) 0xFF else 0<br /> value_potentially_set_to_zero = value & validity_flag<br /> value_potentially_set_to_neutral = (value & validity_flag) | (neutral_value & ~validity_flag)<br /> valid_pixels = valid_pixels + validity_flag / 255<br /> minimum = min(minimum, value_potentially_set_to_neutral)<br /> maximum = max(minimum, value_potentially_set_to_neutral)<br /> sum = sum + value_potentially_set_to_zero<br /> sum_square = sum_square + value_potentially_set_to_zero * value_potentially_set_to_zero</span></span><br />
<br />
<div style="text-align: justify;">
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">(value & validity_flag) | (neutral_value & ~validity_flag)</span></span> is a quite common pattern in SIMD code to implement a if/then/else pattern without branches (for classic scalar code, if/then/else branches are more efficient due to the CPU being able to do branch prediction)</div>
<br />
<div style="text-align: justify;">
The only complication is that there is no SSE2 intrinsics for non-equality testing, so we have to transform that a bit to use equality testing only. And we will also remove the need for division in the loop :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> foreach value:<br /> invalidity_flag = if (value == nodata) 0xFF else 0<br /> value_potentially_set_to_zero = value & ~invalidity_flag<br /> value_potentially_set_to_neutral = (value & ~invalidity_flag) | (neutral_value & invalidity_flag)<br /> invalid_pixels_mul_255 = invalid_pixels_mul_255 + invalidity_flag<br /> minimum = min(minimum, value_potentially_set_to_neutral)<br /> maximum = max(minimum, value_potentially_set_to_neutral)<br /> sum = sum + value_potentially_set_to_zero<br /> sum_square = sum_square + value_potentially_set_to_zero * value_potentially_set_to_zero<br /><br /> valid_pixels = total_pixels - invalid_pixels_mul_255 / 255</span></span><br />
<br />
The computation of invalid_pixels_mul_255 in a vectorized way is the same as<br />
<div style="text-align: justify;">
v_sum, using the _mm_sad_epu8() trick. The resulting SSE2 code is :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> foreach vector_of_16_bytes v:<br /> v_invalidity_flag = _mm_cmpeq_epi8(v, v_nodata)<br /> v_value_potentially_set_to_zero = _mm_andnot_si128(v_invalidity_flag, v)<br /> v_value_potentially_set_to_neutral = _mm_or_si128(<br /> v_value_potentially_set_to_zero, _mm_and_si128(v_invalidity_flag, v_neutral))<br /> v_invalid_pixels_mul_255 = _mm_add_epi32(invalid_pixels_mul_255,<br /> _mm_sad_epu8(v_invalidity_flag, zero))<br /> [ code for min, max operating on v_value_potentially_set_to_neutral ]<br /> [ code for sum and sum_square operating on v_value_potentially_set_to_zero ]</span></span><br />
<br />
<div style="text-align: justify;">
The <a href="https://github.com/OSGeo/gdal/blob/c905203b4b4a745d4f63a6738c17713bf8c81f95/gdal/gcore/gdalrasterband.cpp#L3901">transposition to AVX2</a> is straightforward.</div>
<br />
<div style="text-align: justify;">
We can notice that this version that takes into account nodata value can only be used once we have hit a pixel that is not the nodata value, to be able to initialize the neutral value.</div>
<br />
<h4>
What about uint16 rasters ?</h4>
<br />
<div style="text-align: justify;">
The same general principles apply. If we still want to limit ourselves to operate with at most uint64 accumulators, given that the maximum square value of a uint16 is 65535*65535, this limits to rasters of 2^64/(65535*65535) ~= 2 billion pixels, which remains acceptable for common use cases.</div>
<br />
<div style="text-align: justify;">
One oddity of the SSE-2 instruction set is that it includes only a _mm_min_epi16() / _mm_max_epi16() mnemonics, that is to say that operates on signed int16. The _mm_min_epu16() that operates on uint16 has been introduced in the later SSE 4.1 instruction set (that is quite commonly found in not so recent CPUs). </div>
<br />
<div style="text-align: justify;">
There are tricks to emulate _mm_min_epu16() in pure SSE2 using saturated subtraction and masking :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> // if x <= y, then mask bits will be set to 1.<br /> mask = _mm_cmpeq_epi16( _mm_subs_epu16(x, y), zero )<br /><br /> // select bits from x when mask is 1, y otherwise<br /> min(x,y) = _mm_or_si128(_mm_and_si128(mask, x), _mm_andnot_si128(mask, y)); </span></span><br />
<br />
<div style="text-align: justify;">
Another way is to shift the unsigned values by -32768, so as to operate on signed 16bit values.</div>
<br />
<div style="text-align: justify;">
This -32768 shift trick is also necessary since, like for the byte case, we want to still be able to use the _madd_epi16 intrinsics, which operates on signed int16, to compute the sum of square values. One subtelty to observe is that when you operate on 2 consecutive pixels at 0, _mm_madd_epi16 does :</div>
<br />
<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;"> (0 - 32768) * (0 - 32768) + (0 - 32768) * (0 - 32768)<br />= 1073741824 + 1073741824<br />= 2147483648 = 2^31</span></span><br />
<br />
<div style="text-align: justify;">
Which actually overflows the range of signed int32 ( [-2^31, 2^31-1] ) ! The good news is that _mm_madd_epi16 does not saturate the result, so it will actually return 0x80000000 as a result. This should normally be interpreted as -2^31 in signed int32 convention, but as we know that the result of _madd_epi16(x,x) is necessary positive values, we can still correctly interpret the result as a uint32 value. This is where you feel lucky that Intel chose two's complement convention for signed integers.</div>
<br />
<div style="text-align: justify;">
To compute the sum of values, no nice trick equivalent to _mm_sad_epu8. So we just do it the boring way: unpack separately the 64bit low and high part of the value register from uint16 to uint32 and accumulate them with _mm_add_epi32.</div>
<br />
<div style="text-align: justify;">
Exactly as for the byte case, the uint16 case can be easily <a href="https://github.com/OSGeo/gdal/blob/c905203b4b4a745d4f63a6738c17713bf8c81f95/gdal/gcore/gdalrasterband.cpp#L4149">transposed to AVX2</a> or</div>
emulated-AVX2.<br />
<br />
<h4>
Conclusion</h4>
<br />
<div style="text-align: justify;">
Conversion between integer and floating-point operations can be costly, so avoiding them as much as possible is a big win (provided that you make sure not to overflow your integer accumulators)</div>
<br />
<div style="text-align: justify;">
In theory, the gains offered by a SSE2/AVX2 optimized version are at most limited to a factor of with_of_vector_register / with_of_elementary_type, so, for bytes and SSE2, to 16. But often the gain is lesser, so do that only when you have come to an already optimized portable C version (or if the SIMD instruction set includes a dedicated intrinsics that just do what you want)</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-20196048697904315572016-05-02T02:48:00.002-07:002016-05-02T02:48:58.781-07:00GDAL/OGR 2.1.0 released<div style="text-align: justify;">
On behalf of the <a href="http://www.gdal.org/">GDAL/OGR</a> development
team and community, I am pleased to announce the release of GDAL/OGR
2.1.0. GDAL/OGR is a C++ geospatial data access library for raster and
vector file formats, databases and web services. It includes bindings
for several languages, and a variety of command line tools.</div>
<br />
The 2.1.0 release is a major new feature release with the following highlights:<br />
<ul>
<li><a href="http://trac.osgeo.org/gdal/wiki/rfc26_blockcache">RFC 26: Add hash-set band block cache implementation for very larger rasters (WMS, WMTS, ...)</a> </li>
<li><a href="https://www.blogger.com/%E2%80%8Bhttps://trac.osgeo.org/gdal/wiki/rfc48_geographical_networks_support">RFC 48: Geographical networks support (GNM)</a> </li>
<li><a href="https://trac.osgeo.org/gdal/wiki/rfc58_removing_dataset_nodata_value">RFC 58: Add </a><a class="missing wiki" href="https://www.blogger.com/null" rel="nofollow">DeleteNoDataValue(</a>) </li>
<li><a href="https://www.blogger.com/%E2%80%8Bhttps://trac.osgeo.org/gdal/wiki/rfc59.1_utilities_as_a_library">RFC 59.1: Make GDAL/OGR utilities available as library functions</a> </li>
<li><a href="https://trac.osgeo.org/gdal/wiki/rfc60_improved_roundtripping_in_ogr">RFC 60: Improved round-tripping in OGR</a>. Implemented in GeoJSON driver</li>
<li><a href="https://trac.osgeo.org/gdal/wiki/rfc61_support_for_measured_geometries">RFC 61: Support for measured geometries</a>. Implemented in Shapefile, PostgreSQL/PostGIS, PGDump, MEM, SQLite, GeoPackage,
FileGDB, OpenFileGDB, CSV, VRT</li>
</ul>
<ul>
<li>New GDAL/raster drivers:
<ul>
<li><a class="ext-link" href="http://gdal.org/frmt_cals.html"><span class="icon">​</span>CALS</a>: read/write driver for CALS Type I rasters
</li>
<li><a class="ext-link" href="http://gdal.org/drv_db2_raster.html"><span class="icon">​</span>DB2</a>: read/write support for DB2 database (Windows only)
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_various.html#ISCE"><span class="icon">​</span>ISCE</a>: read/write driver
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_marfa.html"><span class="icon">​</span>MRF</a>: read/write driver for Meta Raster Format</li>
<li><a class="ext-link" href="http://gdal.org/frmt_safe.html"><span class="icon">​</span>SAFE</a>: read driver for ESA SENTINEL-1 SAR products
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_sentinel2.html"><span class="icon">​</span>SENTINEL2</a>: read driver for ESA SENTINEL-2 L1B/LC1/L2A products
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_wmts.html"><span class="icon">​</span>WMTS</a>: read driver for OGC WMTS services
</li>
</ul>
</li>
<li>New OGR/vector drivers:
<ul>
<li><a class="ext-link" href="http://gdal.org/drv_amigocloud.html"><span class="icon">​</span>AmigoCloud</a>: read/write support for AmigoCloud mapping platform
</li>
<li><a class="ext-link" href="http://gdal.org/drv_db2.html"><span class="icon">​</span>DB2</a>: read/write support for DB2 database (Windows only)
</li>
<li><a class="ext-link" href="http://gdal.org/drv_mongodb.html"><span class="icon">​</span>MongoDB</a>: read/write driver
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_netcdf_vector.html"><span class="icon">​</span>netCDF</a>: read/write driver
</li>
<li><a class="ext-link" href="http://gdal.org/drv_vdv.html"><span class="icon">​</span>VDV</a>: read/write VDV-451/VDV-452 driver, with specialization for the Austrian official open government street graph format
</li>
</ul>
</li>
<li>Significantly improved drivers:
<ul>
<li><a class="ext-link" href="http://gdal.org/drv_csv.html"><span class="icon">​</span>CSV</a>: new options, editing capabilities of existing file
</li>
<li><a class="ext-link" href="http://gdal.org/drv_elasticsearch.html"><span class="icon">​</span>ElasticSearch</a>: read support and support writing any geometry type
</li>
<li><a class="ext-link" href="http://gdal.org/drv_geojson.html"><span class="icon">​</span>GeoJSON</a>: editing capabilities of existing file, "native data" (RFC 60) support
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_mbtiles.html"><span class="icon">​</span>MBTiles</a>: add raster write support. fixes in open support
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_pdf.html"><span class="icon">​</span>PDF</a>: add PDFium library as a possible back-end.
</li>
<li><a class="ext-link" href="http://gdal.org/drv_plscenes.html"><span class="icon">​</span>PLScenes</a>: add support for V1 API
</li>
<li><a class="ext-link" href="http://gdal.org/gdal_vrttut.html"><span class="icon">​</span>VRT</a>: on-the-fly pan-sharpening
</li>
<li><a class="ext-link" href="http://gdal.org/frmt_gtiff.html"><span class="icon">​</span>GTiff</a>: multi-threaded compression for some compression methods
</li>
</ul>
</li>
<li>Port library: add <a class="ext-link" href="http://www.gdal.org/cpl__vsi_8h.html#a5b4754999acd06444bfda172ff2aaa16"><span class="icon">​</span>/vsis3/</a>, <a class="ext-link" href="http://www.gdal.org/cpl__vsi_8h.html#a126c1e0314bbd7e4661bc526f45032c5"><span class="icon">​</span>/vsis3_streaming/</a>, <a class="ext-link" href="http://www.gdal.org/cpl__vsi_8h.html#a5e20b79947f58970f5514b3eb9a524a9"><span class="icon">​</span>/vsicrypt/</a> virtual file systems</li>
<li>Upgrade to EPSG database v8.8 </li>
<li>General sanitization pass to clean-up code, fix a lot of compiler warnings,
as well as issues pointed by static code analyzers.
</li>
<li>Fixes in a number of drivers to be more robust against corrupted files . </li>
</ul>
<div style="text-align: justify;">
You can also find <a href="http://trac.osgeo.org/gdal/wiki/Release/2.1.0-News">more complete information</a> on the new features and fixes in the 2.1.0.</div>
<br />
The release can be downloaded from:<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdal210.zip">http://download.osgeo.org/gdal/2.1.0/gdal210.zip</a> - source as a zip<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdal-2.1.0.tar.gz">http://download.osgeo.org/gdal/2.1.0/gdal-2.1.0.tar.gz</a> - source as .tar.gz<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdal-2.1.0.tar.xz">http://download.osgeo.org/gdal/2.1.0/gdal-2.1.0.tar.xz</a> - source as .tar.xz<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdal-grass-2.1.0.tar.gz">http://download.osgeo.org/gdal/2.1.0/gdal-grass-2.1.0.tar.gz</a> - source of GDAL GRASS plugin<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdalautotest-2.1.0.tar.gz">http://download.osgeo.org/gdal/2.1.0/gdalautotest-2.1.0.tar.gz</a> - test suite<br />
* <a href="http://download.osgeo.org/gdal/2.1.0/gdal210doc.zip">http://download.osgeo.org/gdal/2.1.0/gdal210doc.zip</a> - documentation/website<br />
<br />
<br />
<br />
<div style="text-align: justify;">
As there have been a few changes that affect the behaviour of the library, developers are strongly advised to read the <a href="https://svn.osgeo.org/gdal/branches/2.1/gdal/MIGRATION_GUIDE.TXT">migration guide</a>.</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-70464456662113456332016-03-06T14:16:00.000-08:002016-03-06T14:16:51.578-08:00Paris OSGeo Code Sprint 2016 debrief<div style="text-align: justify;">
While my memories are still fresh, here is a report of this <a href="https://wiki.osgeo.org/wiki/Paris_Code_Sprint_2016">week of code sprinting</a>. First, a big thanks to Olivier Courtin for organizing this event, to all <a href="https://wiki.osgeo.org/wiki/Paris_Code_Sprint_2016#Gold_Sponsors">sponsors</a> that brought up the money to make it happen and to the Mozilla Foundation for hosting us in the most scenic coding venue I've ever seen.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://wiki.osgeo.org/images/thumb/d/d1/Logo-TOSPrint_Paris.png/600px-Logo-TOSPrint_Paris.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="135" src="https://wiki.osgeo.org/images/thumb/d/d1/Logo-TOSPrint_Paris.png/600px-Logo-TOSPrint_Paris.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIZoa57tLOKog51A0TsQn4y0a0127BNUBmcl3kGd3QKJoOXr36eY-Ij257dHTb0F8C3iYKochjFFJ1BvnPPFTDMlRinsQK2qYt5tl57Cj3wdYqvHqVhmU57Do3C_jRC4-usjbobU9qGgY/s1600/20160223_100430.jpg" imageanchor="1"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIZoa57tLOKog51A0TsQn4y0a0127BNUBmcl3kGd3QKJoOXr36eY-Ij257dHTb0F8C3iYKochjFFJ1BvnPPFTDMlRinsQK2qYt5tl57Cj3wdYqvHqVhmU57Do3C_jRC4-usjbobU9qGgY/s320/20160223_100430.jpg" width="320" /></a></div>
<br />
<br />
<div style="text-align: justify;">
As expected, I mostly concentrated on GDAL work. My main tasks were related to polishing and extending the work initiatied by Ari Jolma for the <a href="https://trac.osgeo.org/gdal/wiki/rfc61_support_for_measured_geometries">support of the "M dimension" of geometries</a>, M standing for Measurement, a numeric property attach to each point/vertex and that can encode different attributes: time, lengths, or any other interesting property beyond x, y and z....</div>
<div style="text-align: justify;">
Those good old shapefiles are still a bit fancy since they do not really distinguish between XYZ and XYZM geometries up-front. In fact as soon as you have a Z component, <a href="https://lists.osgeo.org/pipermail/gdal-dev/2016-February/043717.html">the Shapefile specification requires a M value to be encoded</a>, even if not used. There's consequently a nodata value (any value lower than -10^38) for such cases. As M geometries are a bit esoteric, we want to avoid to report them when not being used. Consequently a heuristics has been added to the shapefile driver to probe by default the first shape in the file and checks if it has meaningful M values. If not, the layer geometry type is just declared as being XYZ. This should help with backward compatibility of software using GDAL. Implemented per <a href="https://trac.osgeo.org/gdal/changeset/33538">r33538</a> and <a href="https://trac.osgeo.org/gdal/changeset/33539">r33539</a>.</div>
<div style="text-align: justify;">
The support of M in the CSV driver was more straightforward (<a href="https://trac.osgeo.org/gdal/changeset/33544">r33544</a>) due to the bulk of the work being of course done in the WKT importer/exporter.</div>
<div style="text-align: justify;">
Regarding the GeoPackage driver, the main need was to be able to parse correctly geometry headers for XYM or XYZM bounding boxes that may be found. The main difficulty was to test that since OGR itself just generates XY or XYZ bounding boxes, so editing hexadecimal WKB was needed. Somewhat amusing with a broken laptop screen. Anyway, was done through <a href="https://trac.osgeo.org/gdal/changeset/33551">r33551</a></div>
<div style="text-align: justify;">
Support for M geometries in SQLite/Spatialite required a number of small changes scattered through the driver code base, and new tests for the various variants (regular geometries vs compressed ones). The upgrade of this driver makes it also possible to use XYM/XYZM geometries with the <a href="http://gdal.org/ogr_sql_sqlite.html">SQLite SQL dialect</a> usable by all other drivers. Implemented per <a href="https://trac.osgeo.org/gdal/changeset/33554">r33554</a></div>
<div style="text-align: justify;">
The upgrade of the FileGDB and OpenFileGDB drivers gave me some headaches as it turned out the support of writing M values in the older FileGDB SDK 1.3 was broken. After upgrading to v1.4, things went much more smoothly. Support for M with FileGDB v9.X. Implemented per <a href="https://trac.osgeo.org/gdal/changeset/33563">r33563</a> . For the nostalgics, the PGeo driver should also benefit from those changes, although this wasn't tested.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
On the MapServer front, in the middle of many other things, Thomas Bonfort merged in time for MapServer 7.0.1 an older pull request from mine that I had forgotten to <a href="https://github.com/mapserver/mapserver/pull/5061">support 64 bit integer fields</a> that may now come with GDAL 2.0. I also backported a fix to <a href="https://github.com/mapserver/mapserver/commit/a752e7003d161dc5c9e6e7e9ecc91e354948627e">handle WMS TIME on contour layers</a>, in time for MapServer 6.4.3.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Aside for my own coding, I enjoyed spending time with other developers to help them on their GDAL tasks. With Rob Emmanuele, we tried to figure out how to make the "driver" that handles files accessible through HTTP/HTTPS to better report errors, especially on Amazon S3 storage, so that upper library or application layers can better deal with them. In particular, you want to be able to distinguish an inexting ressource (typo in the URL for example), from a valid one but for which you have not specified the right credentials. This turned out to be much more difficult as I would have myself anticipated, since there are a lot of situations where we want errors to accessing files to be silent (for example when drivers probe from potential "sidecar" files that accompany main files. Think to the .prj, .wld, .aux files), and there's no way in the current design to know when to be verbose or not. Rob finally came with a design of a file system error reporthing mechanism, that is not verbose by default, but that may be queried by the code paths that want to report errors in a verbose way. This is still <a href="https://github.com/OSGeo/gdal/pull/98">work in progress</a>, but hopefully Rob should be able to polish it to be included in the upcoming GDAL 2.1 release (feature freeze at the end of this month).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
With Yann Chemin, we had quite of fun exploring how to better support the catalog of spatial reference systems published by the <a href="http://www.iau.org/">IAU</a> (International Astronomical Union) that describes the SRS used for other planets and satellites. In particular, we discovered that some of those SRS used the Oblique Cylindrical Equal Area (OCEA) projection. This projection is supported by <a href="http://proj4.org/">proj.4</a> (thanks to Howard Butler for designing a modern website for this not always sexy but so fundamental piece of software that is proj.4), but not by the OGR Spatial Refrence (OSR) component of GDAL itself. The main challenge to make it available through OSR is to be able to map the proj.4 parameters of the projection to parameter names in WKT. Documentation to do that is generally scarce, and we ended up opening the bible of the projection experts, that is to say <a href="http://pubs.usgs.gov/pp/1395/report.pdf">"Map Projections - A Working Manual", by John P. Snyder, USGS Professional Paper 1395</a>, whose proj.4 is mostly the translation in C code. The book gave some light at its page 80 regarding the OCEA projection. The interesting part of OCEA is that it comes with 2 variants... The gist of the support is now in this <a href="https://github.com/OSGeo/gdal/pull/101">pull request</a>, with some more work and research to clarify the remaining mysteries. In the meantime, GRASS can now benefits from IAU codes (<a href="https://trac.osgeo.org/grass/changeset/67950">r67950</a> and <a href="https://trac.osgeo.org/grass/changeset/67951">r67951</a>)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Always wondering about the possible command line switches of GDAL/OGR utilities ? Guillaume Pasero contributed <a href="https://trac.osgeo.org/gdal/ticket/6381">a bash completion script</a> to improve your user experience.<br />
<br />
<blockquote class="tr_bq">
$ ogr2ogr - (TAB character pressed)<br />-append --debug -dsco --format --help-general --locale --optfile -preserve_fid -skipfailures -sql -update<br />-a_srs -dialect -f --formats -lco -nln -overwrite -progress -spat -s_srs --version<br />--config -dim -fid -geomfield --license -nlt --pause -select -spat_srs -t_srs -where</blockquote>
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Regine Obe also worked on improving the ODBC support in OGR: <a href="https://trac.osgeo.org/gdal/ticket/6000">build support of Windows ODBC libarries with the mingw64 compiler</a>, ability to <a href="https://trac.osgeo.org/gdal/ticket/6394">support a large number of columns in tables</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<br />
<br />Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-3867455094037482812016-01-05T03:35:00.000-08:002016-01-05T03:35:28.551-08:00Software quality improvements in GDAL<div style="text-align: justify;">
As a new year is popping up, it is time to take good resolutions. To help you, especially if you are a C/C++ developer, this article gives feedback on efforts made over the last few months to improve the quality of the <a href="http://gdal.org/">GDAL/OGR</a> code base, and hopefully its quality for the end user.
</div>
<br />
<h3>
Enable as many compiler warning options as possible</h3>
<br />
By default, C/C++ compilers enable a few warning categories, but this is far from being sufficient. You want to turn on extra warnings. The set of available warning options depends on the compiler, so you may want to autodetect which are available in your configure/cmake script. The below flags are used by GDAL with GCC and CLang:<br />
<br />
<br />
In the "Must have" category :
<br />
<ul>
<li><code>-Wall</code>: basic set of extra warnings</li>
<li><code>-Wextra</code>: a few more</li>
<li><code>-Wdeclaration-after-statement</code> : for C code, to avoid compilation error on Microsoft Visual Studio.</li>
<li><code>-Werror=vla</code>: variable length arrays are a GCC extension not supported by V.S.</li>
<li><code>-Wformat</code>: detects error in the use of printf() like statements</li>
<li><code>-Werror=format-security</code>: error out on such errors that have security implications</li>
<li><code>-Wno-format-nonliteral</code>: this is an exception to allow the formatting strict to be a non constant. We need that in a few cases for GDAL, but try without if you can.</li>
<li><code>-Winit-self</code>: warn about uninitialized variables that are initialized with themselves</li>
</ul>
<br />
Good practice:<br />
<ul>
<li><code>-Wmissing-prototypes</code>: for C code</li>
<li><code>-Wmissing-declarations</code>: helped fixing <a href="https://github.com/rouault/gdal2/commit/4368db0f60dbfa528b2bc61dd1a78e363697e22d">mismatches</a> between function prototypes and their implementation.</li>
<li><code>-Wnon-virtual-dtor</code>: make it compulsory to define a destructor of a C++ class as virtual</li>
<li><code>-Wlogical-op</code>: (GCC only) a few heuristics that detect wrong uses of the logical and/or operators. Can have some false positives sometimes, but helped found quite a few bugs like <a href="https://github.com/rouault/gdal2/commit/9df7243a3b64aa7fe2e45567b954e1f7760619e6">a condition that always evaluate to false</a>, <a href="https://github.com/rouault/gdal2/commit/1988aa0cc0556c7142a10ca1064b6451d44cd6a2">another one that always evaluate to true</a> (<a href="https://github.com/rouault/gdal2/commit/79549a7c669ad7a540c5606a98b8d35a564e8001">or this one too</a>) and <a href="https://github.com/rouault/gdal2/commit//aa007a9a631a7ab0dd9e84fd3224804253527454">my preferred one</a> (we were very lucky that, in ASCII, that the single quote character is just before open parenthesis, otherwise this check wouldn't have detected the issue). Interestingly, not all versions of GCC or CLang raise the same warnings, due to varying heuristics.</li>
<li><code>-Wunused-private-field</code>: (CLang only, C++) detect unused private members in classes.</li>
<li><code>-Wunused-parameter</code>: detects unused parameters. In C++, you can just omit the argument name if it is unused. In C, you can use a macro that will expand to __attribute((__unused__)) on GCC compatible compilers.</li>
<li><code>-Wnull-dereference</code>: detects situations where a NULL pointer dereference is possible. Only available in (unreleased at this time) GCC 6. A few positives are possible (usually the warning can be workarounded)</li>
<li><code>-Wduplicated-cond</code>: detects redundant conditions in if / else if constructs. Only available in GCC 6</li>
</ul>
<br />
Nice to have, but can require extensive efforts to fix: <br />
<ul>
<li><code>-Wshorten-64-to-32</code>: (CLang only) will detect potential portability problems for 64-bit builds.<code> </code></li>
<li><code>-Wno-sign-compare</code>: (CLang only) warn on comparisons where members accross the comparison operators have not the same signedness.</li>
<li><code>-Wshadow</code>: detect variable "shadowing", for example a local variable with the same name as a global variable or a class member. This can help enforcing style conventions like using m_ to prefix member variables.</li>
<li><code>-ftrapv</code>: generates runtime error if overflow occurs on signed integers. Such overflows are unspecified behaviour in C/C++, so it is good to be able to catch them. It is somewhat redundant of -fsanitize=undefined, although it has been available in compilers for a longer time (but with uneffective implementations in older GCC versions. Recent clang versions have it really working well though). Perhaps only enable this in debug builds as we do in GDAL.</li>
</ul>
<br />
And once you have cleaned up your code base, you can add the magic <code>-Werror</code> flag that will cause any warning to be treated as an error so as to maintain it in a warning-free state.<br />
<br />
Sometimes you have unfortunately to deal with external library headers that trigger compiler warnings themselves. Nothing you can really do about that. GCC and clang have an interesting workaround for that. Basically create your own header, and call <code>#pragma GCC system_header</code> before including the third-party headers. Here's an <a href="https://svn.osgeo.org/gdal/trunk/gdal/ogr/ogrsf_frmts/filegdb/filegdbsdk_headers.h">example</a>.<br />
<br />
In GDAL 2.0, enabling the above mentionned warning options caused 3865 warnings to be raised. In GDAL 2.1.0dev, we cut it down to 0.<br />
<br />
For Visual Studio, enable /W4 for the more extensive set of warnings, and add exceptions when needed. In GDAL, we use the following exceptions (only enable them if really needed):<br />
<ul>
<li><code>/Wd4127</code>: conditional expression is constant</li>
<li><code>/Wd4251</code>: related to export of symbols</li>
<li><code>/Wd4275</code>: related to export of symbols</li>
<li><code>/Wd4100</code>: unreferenced formal parameter (this would be the equivalent of -Wno-unused-parameter) since there's no way of tagging a function parameter as being unused in VS</li>
<li><code>/Wd4245</code>: to disable warnings about conversions between signed and unsigned integers<code> </code></li>
<li><code>/Wd4611</code>: to disable warnings about using setjmp() and C++</li>
</ul>
<br />
<h3>
Use different compilers</h3>
<div style="text-align: justify;">
The more compilers you try, the more issues they will raise. In GDAL, we have continuous integrations targets that use different versions of GCC, CLang and Microsoft Visual Studio.</div>
<div style="text-align: justify;">
<br /></div>
<h3>
Function annotations</h3>
GCC and CLang offer a set of attributes/annotations that can be added to a function declaration.<br />
<ul>
<li><code>__attribute__((warn_unused_result))</code>: to warn when the return value of a function is not used. This will increase the reliability of your code by ensuring that error conditions are properly dealt with (in the case you deal with errors with return values rather than C++ exceptions)</li>
<li><code>_attribute__((__format__ (__printf__, format_idx, arg_idx)))</code>: to flag a function as behaving like printf() and related functions. The compiler will then check that the arguments passed in the variable list are of the correct type and number with respect to the formatting string.</li>
<li><code>__attribute__((__sentinel__))</code>: to flag a function taking a variable list of arguments to expect the last argument to be a NULL pointer.</li>
</ul>
<br />
<ul>
</ul>
<h3>
Static code analysis</h3>
<div style="text-align: justify;">
Static code analysis is the logical extension of checks done by the compiler, except that more complex checks can be done to try detecting logic errors beyond checks that are strictly needed to compile the code.</div>
<div style="text-align: justify;">
<a href="http://clang-analyzer.llvm.org/">CLang Static Analyzer</a>, an add-on to the LLVM/CLang compiler, is such a tool. I must warn it has a significant rate of false positive warnings, which with some effort can generally be workarounded by added extra assertions in your code. Or, if it takes you more than 10 seconds to figure out that the warning is in fact a false positive, it is a sign that your code is likely too complex, and by simplifying it, you will make your life and the one of the analyzer easier. Despite false positives, it can finds real bugs such as <a href="https://github.com/rouault/gdal_coverage/commit/09480116d5e52e039e73aa2eadae48c5e7658b64">an access beyond buffer bounds</a>, <a href="https://github.com/rouault/gdal_coverage/commit/e88eed95ffe7edeb19910040e816322bb494aa1d">a memory leak</a> or <a href="https://github.com/rouault/gdal_coverage/commit/23f817599ddc4aa324274bd733025a5ed0b66050">the dereferencing of a NULL pointer</a>. I'd recommend enabling the optional alpha.unix.cstring.OutOfBounds and alpha.unix.cstring.BufferOverlap checkers. We finally got to 0 warnings in the GDAL code base. You can even write your own checkers, to enforce specific development rules, as <a href="http://cgit.freedesktop.org/libreoffice/core/tree/compilerplugins">the Libreoffice developers did</a>, but this can be rather involved and we haven't been up to that point yet in GDAL.</div>
<br />
<div style="text-align: justify;">
<a href="http://cppcheck.sourceforge.net/">cppcheck</a>
is another free&open-source C/C++ static analysis tool that can be
used, although I found it to be less powerful and slower than CLang
Static Analyzer. You can run it with <code>cppcheck --enable=all --force --inconclusive *.cpp</code> and ignore warnings about unused functions.</div>
<br />
<div style="text-align: justify;">
In GDAL, we also use the commercial Coverity Scan tool, whose use is free (as in beer) for free&open source software. Our experience is that Coverity Scan has a reasonably low rate of false positives (probably around 10%). One of its strength is its ability to correlate portions of code that are not in the same C/C++ file. In June 2015, we had more than 1100 warnings raised by Coverity Scan. With the help of <a href="https://twitter.com/kurtschwehr">Kurt Schwehr</a> who fixed several hundreds of them, we finally reached 0 (with a bit less than 100 annotations to discard false positives).</div>
<br />
<div style="text-align: justify;">
Side node: as GDAL uses Python for a few of its utilities and its test suite, we use the <a href="https://pypi.python.org/pypi/pyflakes">pyflakes</a> utility to do some basic checks on the Python code.</div>
<br />
<h3>
Automated test suite and code coverage</h3>
<div style="text-align: justify;">
What we mentionned above were static checks, that is to say checks that are done just by "looking" at the code, but not by running it. To cover the dynamic aspect, GDAL has an extensive automated test suite that checks behaviours of utilities, core functions and driver behaviours. Writing tests is good, but knowing what part of your code the tests cover is even better. In order to do so, we have a continuous integration target that compiles the code with profiling options (--coverage flag of gcc) so that you can get a report after tests execution of which lines and branches of code have been actually run. Combined with the <a href="https://gcc.gnu.org/onlinedocs/gcc/Gcov.html">gcov</a> and <a href="http://ltp.sourceforge.net/coverage/lcov.php">lcov</a> utilities, this can produce a <a href="http://rawgit.com/rouault/gdalautotest-coverage-results/master/coverage_html/index.html">nice HTML output</a>. With the default set of test data, 63% of the compiled lines of GDAL are executed at least once (and with some test driver methodology, we can reach 90% or more in recently developed drivers such as <a href="http://rawgit.com/rouault/gdalautotest-coverage-results/master/coverage_html/frmts/sentinel2/index.html">Sentinel2</a>, <a href="http://rawgit.com/rouault/gdalautotest-coverage-results/master/coverage_html/frmts/wmts/index.html">WMTS</a> or <a href="http://rawgit.com/rouault/gdalautotest-coverage-results/master/coverage_html/ogr/ogrsf_frmts/vdv/index.html">VDV</a>. Some projects who use GitHub / Travis-CI also go with the <a href="https://coveralls.io/">Coveralls</a> service, that integrates well with those, to track code coverage (my tests with Coveralls roughly one year ago were not successfull, apparently due to the size of the GDAL code base).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If you decide fixing compiler and static code analyzis warnings, I would strongly recommend making sure your test suite covers the code you are fixing, as it is sometimes easy to introduce regressions while trying to fix static warnings.</div>
<br />
<h3>
Dynamic checkers</h3>
<br />
<div style="text-align: justify;">
In C/C++, it is easy to misuse dynamically allocated memory, either by reading or writing outside of the allocated zones, using freed memory or forgetting to free memory. You can run your test suite through the <a href="http://valgrind.org/">Valgrind</a> memory debugger (available on Linux and MacOSX) or <a href="http://www.drmemory.org/">DrMemory</a> (Windows 32 bit). Valgrind is really an excellent tool. You should use it. Unfortunately it slows down execution by a significant factor (10 or more), due to on-the-fly code instrumentation, which can make it unpractical for some continuous integration environment where runs are time limited.<br />
<br />
Another option is to use the <a href="http://clang.llvm.org/docs/AddressSanitizer.html"><code>-fsanitize=address</code></a> flag of recent GCC/CLang versions that does similar checks as Valgrind, but the instrumentation is done at compile time, which makes the slowdown to be much more bearable. Other sanitizers such as <a href="http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html"><code>-fsanitize=undefined</code></a> can also been used to catch situations where undefined behaviour as defined in the C/C++ standards happen, and so you rely on the choices done by the compiler, or the specific logic of the CPU architecture (as <a href="https://trac.osgeo.org/gdal/ticket/6091">this bug reports shows</a>, not all CPU architectures deal the same with overflows during signed/unsigned conversions). <code><a href="http://clang.llvm.org/docs/ThreadSanitizer.html">-fsanitize=thead</a> </code>can also be used to detect issues with thread usage</div>
<br />
<h3>
Fuzz testing</h3>
<div style="text-align: justify;">
<a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a>, also known as AFL,
is a quite recent tool to do fuzz testing that has gained a lot of
popularity. The principle is that you feed it with an initial file that
you run through your utility and AFL will do various random or
not-so-random changes in it to try triggering bugs. This is particularly well suited for command line utilities such as gdalinfo or ogrinfo that takes a file as input.</div>
<div style="text-align: justify;">
An interesting
innovation of AFL with respect to similar tools is that, through compilation time instrumentation, it checks with
code branches have been taken or not, to determine which changes in the test file cause which code branches to be taken, so as to maximize the use of code branches. It can also be used to generate test cases for greater code coverage by writing out the input file when you hit a branch you want covered<br />
<br />
AFL has for example identified <a href="https://github.com/rouault/gdal_coverage/commit/abf945ba3762790727d36277029d90e32c062517">this divide by zero bug</a> or <a href="https://github.com/rouault/gdal_coverage/commit/7f77d86861f5bb068f7bde5d1c1e4cdab2f47833">this improper use of data types</a> for which the fix is more involved.<br />
<br />
Lately I've played with the <a href="http://volatileminds.net/2015/07/01/advanced-afl-usage.html">afl-clang-fast</a> experimental module of AFL that requires the code to be compiled with CLang. With special instrumentation (but <a href="https://trac.osgeo.org/gdal/changeset/32677">very simple to put in place</a>), in the GDAL binaries like gdalinfo or ogrinfo, AFL can run typically 10 times faster, reaching a thousand of tests per second. Combined with <code>-ftrapv</code> (and possibly other sanitizers such as <code>-fsanitize=undefined</code>), it has for example caught dozains of situations where integer overflow could happen on corrupted data.</div>
<br />
<h3>
Continuous integration </h3>
<div style="text-align: justify;">
To make all the above mentionned good practice really useful (perhaps except fuzz testing which is a slow operation), it is highly recommended to have continuous integration practices. That is to say use a mechanism that automates compilation and execution of your test suite each time a change is pushed to the code repository. In GDAL, we have 16 different configurations that are regularly run, using different versions of gcc/clang/Visual Studio, 32 bit vs 64 bit builds, Python 2.X vs Python 3.X, Linux/MacOsX/Windows/Android, big endian host, C++11 compatibility, a target running the test suite with GCC -fsanitize=address -fanitize=undefined and also a CLang Static Analysis target. You can find the badges for those different target on the <a href="https://github.com/OSGeo/gdal">GitHub mirror</a> home page.<br />
Just thinking that a C++14 target could also be useful as sometimes upgrading to the newer standard can reveal bugs only at runtime as <a href="https://trac.osgeo.org/gdal/ticket/6002">this bug report shows</a>.<br />
<br />
To run all those configurations, we use the Travis-CI (all configurations except Visual Studio builds) and AppVeyor (Visual Studio builds) online services (sorry neither use free&open-source software). Alternatives such as GitLab-CI using F.O.S.S exist.</div>
<br />
<h3>
Conclusion</h3>
<div style="text-align: justify;">
When you decide to finally tackle compiler and static code analyzis warnings in a code base as large as GDAL, this can be a huge effort (to be evaluated in weeks/months of manpower). But the effort is worth it. It helps uncovering real bugs that were lying around and make it more friendly for contributors to do further code changes.<br />
<br />
This article has been written by Even Rouault (<a href="http://spatialys.com/">Spatialys</a>), with contributions from Kurt Schwehr.<br />
</div>
<br />Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com0tag:blogger.com,1999:blog-7797431205725571762.post-58297078333752716202015-10-12T05:32:00.000-07:002015-10-12T08:15:39.831-07:00GDAL and OGR utilities as library functions<div style="text-align: justify;">
It has been often stated that the popular <a href="http://gdal.org/">GDAL/OGR</a> C/C++ command line utilities, such as <a href="http://gdal.org/gdal_translate.html">gdal_translate</a>, <a href="http://gdal.org/gdalwarp.html">gdalwarp</a>, <a href="http://gdal.org/ogr2ogr.html">ogr2ogr</a>, etc... are mostly examples of how to use the GDAL/OGR API for your own purposes. The truth is that many people just need their functionnality without much additional tuning. Furthermore, over time those utilities have gained a lot of options and tweaks which make it time consuming to understand their working or reinvent them from scratch. Or people repeatedly copied&pasted their source code into their own code. Or simply spawn an external process with the binary of the utility. None of those approaches was optimal.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The recently approved <a href="https://trac.osgeo.org/gdal/wiki/rfc59.1_utilities_as_a_library">RFC 59.1 "GDAL/OGR utilities as a library"</a>, that will be part of GDAL 2.1, aims at solving those problems by moving the code of the utilities into the main GDAL library.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The main advantages are :</div>
<ul style="text-align: justify;">
<li>the utilities can be easily called from any of the supported languages : C, C++, Python, Java (C# and Perl untested at time of writing, but should work with slight changes).</li>
<li>in-memory datasets can be used for input and output, avoiding the creation of temporary files on permanent storage.</li>
<li>a callback for progress report and cancellation of the processing can be provided.</li>
<li>faster execution for repeated calls (in comparison to the case where an external process was spawned)</li>
</ul>
<div style="text-align: justify;">
Here's a 3 line example in Python to reproject a GeoTIFF in WebMercator and export it as a PNG file :</div>
<div style="text-align: justify;">
</div>
<blockquote class="tr_bq">
<div style="text-align: justify;">
from osgeo import gdal </div>
<div style="text-align: justify;">
tmp_ds = gdal.Warp('temp', 'in.tif', format = 'MEM', dstSRS = 'EPSG:3857') </div>
<div style="text-align: justify;">
gdal.Translate('out.png', tmp_ds, format = 'PNG')</div>
</blockquote>
<div style="text-align: justify;">
The utilities currently available from the library are :</div>
<ul>
<li>gdalinfo (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdalinfo_lib.py">example from Python</a>)</li>
<li>gdal_translate (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdal_translate_lib.py">example from Python</a>)</li>
<li>gdalwarp (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdalwarp_lib.py">example from Python</a>)</li>
<li>ogr2ogr (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_ogr2ogr_lib.py">example from Python</a>)</li>
<li>gdaldem (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdaldem_lib.py">example from Python</a>)</li>
<li>nearblack (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_nearblack_lib.py">example from Python</a>)</li>
<li>gdal_rasterize (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdal_rasterize_lib.py">example from Python</a>)</li>
<li>gdal_grid (<a href="http://svn.osgeo.org/gdal/trunk/autotest/utilities/test_gdal_grid_lib.py">example from Python</a>)</li>
</ul>
<div style="text-align: justify;">
gdalbuildvrt will probably be added to this list as well.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This work has been initiated by <a href="https://github.com/fazam">Faza Mahamood</a> during GSoC 2015 and has been integrated and extended by Even Rouault (<a href="http://spatialys.com/">Spatialys</a>).<br />
<br />
EDIT: as a few people were wondering, the existing command line utilities are kept of course and use the library functions...</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com4tag:blogger.com,1999:blog-7797431205725571762.post-30450693851437030512015-09-23T05:36:00.000-07:002015-09-23T05:36:24.411-07:00GDAL/OGR 2.0.1 and 1.11.3 releasedOn behalf of the GDAL/OGR development team, I am pleased to<br />announce the release of the GDAL/OGR 1.11.3 and 2.0.1 bug fix versions. Each <br />one contains more than 40 bug fixes since 1.11.2 / 2.0.0.<br /><br />The sources for 1.11.3 are available at:<br /><br /> <a href="http://download.osgeo.org/gdal/1.11.3/gdal-1.11.3.tar.xz">http://download.osgeo.org/gdal/1.11.3/gdal-1.11.3.tar.xz</a><br /> <a href="http://download.osgeo.org/gdal/1.11.3/gdal-1.11.3.tar.gz">http://download.osgeo.org/gdal/1.11.3/gdal-1.11.3.tar.gz</a><br /> <a href="http://download.osgeo.org/gdal/1.11.3/gdal1113.zip">http://download.osgeo.org/gdal/1.11.3/gdal1113.zip</a><br /><br />GRASS plugin:<br /> <a href="http://download.osgeo.org/gdal/1.11.3/gdal-grass-1.11.3.tar.gz">http://download.osgeo.org/gdal/1.11.3/gdal-grass-1.11.3.tar.gz</a><br /><br />The sources for 2.0.1 are available at:<br /><br /> <a href="http://download.osgeo.org/gdal/2.0.1/gdal-2.0.1.tar.xz">http://download.osgeo.org/gdal/2.0.1/gdal-2.0.1.tar.xz</a><br /> <a href="http://download.osgeo.org/gdal/2.0.1/gdal-2.0.1.tar.gz">http://download.osgeo.org/gdal/2.0.1/gdal-2.0.1.tar.gz</a><br /> <a href="http://download.osgeo.org/gdal/2.0.1/gdal201.zip">http://download.osgeo.org/gdal/2.0.1/gdal201.zip</a><br /><br />Details on the fixes in those releases are available at:<br /> <a href="http://trac.osgeo.org/gdal/wiki/Release/1.11.3-News">http://trac.osgeo.org/gdal/wiki/Release/1.11.3-News</a><br /> <a href="http://trac.osgeo.org/gdal/wiki/Release/2.0.1-News">http://trac.osgeo.org/gdal/wiki/Release/2.0.1-News</a><br />Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com1tag:blogger.com,1999:blog-7797431205725571762.post-18308199911496512732015-07-02T13:01:00.000-07:002015-07-02T13:01:14.427-07:00Reliable multithreading is hard<div style="text-align: justify;">
As the race to increase processor speed has stalled during the last decade, CPU manufacturors have rather increased the number of transistors and parallel execution units within a single CPU to keep up with Moore's law. Hence the multi-cores, hyper-threading technologies. This is all good but contrary to extra MHz, software has to be explicitely designed to deal with those parallelism mechanisms, in order to use the full capacity of the CPU. Multi-threaded programming is not a novelty, but even when you're aware of its caveats, it is easy to be caught in troubles and not trivial to diagnose and fix them.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I've spent a good deal of time those last days trying to understand the reason for a failure that began to occur randomly in the GDAL automated test suite that ran on the Travis continuous integration system. Step 27 of the tests of the JPEG2000 OpenJPEG driver crashed roughly 1/3 to 1/2 of the time. The crashes started just after I merged <a href="https://trac.osgeo.org/gdal/wiki/rfc26_blockcache">RFC 26 - GDAL block cache improvements</a> in the development version (future 2.1.0).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A few words about this global block cache for rasters: a block is a rectangular region of a raster, generally one or several lines (strips), or squares of size 256x256 typically. When code fetches pixel values from a raster, GDAL caches the contents of the blocks of the raster that intersect the requested area, so that following requests in the same area are faster.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
My first reaction when seeing the new failures was to try reproducing them locally on my machine, but that didn't work. I also tried on another machine: no better. Then I ran the tests under the Valgrind memory debugger, since random failures tend to be often associated with misuse of dynamic memory, and still no way of making the tests crash. This became a hint that multi-threading could be involved.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The main work involved in RFC 26 wasn't primarily about modifying multi-threaded behaviour, but there were indeed a few non-trivial changes done in that area, so as to preserve and even slightly improve performance of the global block cache in a multi-threaded context, by reducing to the minimum the scope of the mutex that protects the shared linked list of cached blocks. So it seemed likely I had introduced a regression in the multi-threaded case with my changes. Strangely, the test step that crashed did not involve multi-threading... But reviewing RFC 26 code again, I could not spot a potential race (and some of the work of this RFC actually consisted in removing some potential old races). However, even with careful review, it is easy to miss threading issues.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So I added a few assertion checks and debug traces just before the point where the crash occured, but by doing so, of course, the crash did not happen anymore. So I removed some of them, and today, I finally got again a crash with some usefull debug information I had kept fortunately. The reason for the crash was that a block remained in the block cache even after the raster band to which it belonged to had been destroyed, which should normally not happen, as at band destruction, the cached blocks that belongs to a band are removed from the cache. In step 27, due to the saturation of the cache, the block got evicted from the cache (expected behaviour since the cache has a finite size). Evicting a block requires some interaction with the band to which it belongs to. When the band is in a zombie state, crash is likely to happen. The particular characteristics of this zombie block helped me to identify it as coming from the earlier step 15 of the OpenJPEG tests. And interestingly, this test case exercises an optimization in the OpenJPEG driver to have multi-threaded reading of JPEG2000 tiles. Now investigating the OpenJPEG driver, I discovered it violated a general principle of GDAL, by concurrently accessing in a few places a dataset from several threads. Before RFC 26, the potential for this violation to result in a crash was low (those tests had run for 2 or 3 years without crashing), although theoretically possible. With the new code of RFC 26, an optimization of the TryGetLockedBlockRef() that no longer instanciated the rasterband cache caused it to be later concurrently initialized, which is in no way supported. Explanation found. The fix was simple: just protect with a mutex the few places in the OpenJPEG driver where concurrent use of the dataset could happen.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A few lessons :</div>
<ul>
<li style="text-align: justify;">when adding new code, if something fails, the culprit is not necessarily the new code by itself, but it might just reveal potential dormant defects in existing code.</li>
<li style="text-align: justify;">debugging multi-threading issues sometimes looks like quantum physics: when you are looking at the system, you modify its behaviour.</li>
<li style="text-align: justify;">extensive test suites (more than 180 000 lines of Python code for GDAL) are an investment that ends up paying off. In that case, the fact that the test steps do not run in isolation from each other, but are run by a long living process, probably made it possible to observe the bug frequently enough.</li>
<li style="text-align: justify;">continuous integration helps uncovering problems close to their introduction, which is far easier to solve rather than few months later, just before releasing.</li>
</ul>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
While we are talking about multi-threading, a <a href="https://github.com/mapnik/mapnik/pull/2686">change in Mapnik</a> to improve efficiency of GDAL use turned to cause <a href="https://github.com/mapnik/node-mapnik/issues/437">random crashes when using VRTs with node-mapnik</a>.
The ground reason was that VRT datasets are opened in the main thread
of node-mapnik before being dispatched to worker threads. The issue is
that, by default, VRTs open their underlying sources in a shared mode,
i.e. if one or several VRTs points to the same source (e.g. a TIFF
file), and, important detail, if the opening
is done by the same thread, it is opened just once. Which was the case here. So dispatching
those VRT dataset handles into multiple thread afterwards can cause
concurrent access to the TIFF dataset objects. To solve this, <a href="https://trac.osgeo.org/gdal/ticket/5992">an improvement had been introduced in GDAL 2.0</a>
to add the possibility to open VRT sources in a non-shared mode. Is has
been discovered today that in the case where you use VRTs whose sources
are themselves VRT, <a href="https://trac.osgeo.org/gdal/ticket/6017">an extra fix was needed</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com1tag:blogger.com,1999:blog-7797431205725571762.post-46885099633165639502015-06-18T12:39:00.001-07:002015-06-18T13:59:29.293-07:00GDAL/OGR 2.0.0 released<div style="text-align: justify;">
On behalf of the <a href="http://www.gdal.org/">GDAL/OGR</a> development team and community, I am pleased to announce the release of GDAL/OGR 2.0.0. GDAL/OGR is a C++ geospatial data access library for raster and vector file formats, databases and web services. It includes bindings for several languages, and a variety of command line tools.</div>
<br />
The 2.0.0 release is a major new feature release with the following highlights:<br />
<br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc46_gdal_ogr_unification">GDAL and OGR driver and dataset unification</a> <br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc31_ogr_64">64-bit integer fields and feature IDs</a><br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc49_curve_geometries">Support for curve geometries</a> <br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc50_ogr_field_subtype">Support for OGR field subtypes</a> (boolean, int16, float32) <br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc51_rasterio_resampling_progress">RasterIO() improvements</a> : resampling and progress callback <br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc52_strict_sql_quoting">Stricter SQL quoting</a> <br />
* <a href="http://trac.osgeo.org/gdal/wiki/rfc53_ogr_notnull_default">OGR not-null constraints and default values</a> <br />
* <a href="https://trac.osgeo.org/gdal/wiki/rfc54_dataset_transactions">OGR dataset transactions</a> <br />
* <a href="https://trac.osgeo.org/gdal/wiki/rfc55_refined_setfeature_deletefeature_semantics">Refined SetFeature() and DeleteFeature() semantics</a> <br />
* <a href="https://trac.osgeo.org/gdal/wiki/rfc56_millisecond_precision">OFTTime/OFTDateTime millisecond accuracy</a><br />
* <a href="https://trac.osgeo.org/gdal/wiki/rfc57_histogram_64bit_count">64bit histogram bucket count</a> <br />
<br />
* New GDAL drivers:<br />
- <a href="http://bellard.org/bpg/">BPG</a>: read-only driver for Better Portable Graphics format (experimental)<br />
- <a href="http://gdal.org/drv_geopackage_raster.html">GPKG</a>: read/write/update capabilities in the unified raster/vector driver <br />
- <a href="http://gdal.org/frmt_kea.html">KEA</a>: read/write driver for KEA format <br />
- <a href="http://gdal.org/frmt_plmosaic.html">PLMosaic</a>: read-only driver for Planet Labs Mosaics API <br />
- <a href="http://www.gdal.org/frmt_various.html#ROI_PAC">ROI_PAC</a>: read/write driver for image formats of JPL's ROI_PAC project <br />
- <a href="http://www-mipl.jpl.nasa.gov/external/vicar.html">VICAR</a>: read-only driver for VICAR format <br />
<br />
* New OGR drivers:<br />
- <a href="http://gdal.org/drv_cloudant.html">Cloudant</a> : read/write driver for Cloudant service <br />
- <a href="http://gdal.org/drv_csw.html">CSW</a>: read-only driver for OGC CSW (Catalog Service for the Web) protocol <br />
- <a href="http://gdal.org/drv_jml.html">JML</a>: read/write driver for OpenJUMP .jml format <br />
- <a href="http://gdal.org/drv_plscenes.html">PLScenes</a>: read-only driver for Planet Labs Scenes API <br />
- <a href="http://gdal.org/ogr/drv_selafin.html">Selaphin</a>: read/write driver for the Selaphin/Seraphin format<br />
<br />
* Significantly improved drivers: <a href="http://gdal.org/ogr/drv_csv.html">CSV</a>, <a href="http://gdal.org/ogr/drv_geopackage.html">GPKG</a>, <a href="http://gdal.org/ogr/frmt_gtiff.html">GTiff</a>, <a href="http://gdal.org/ogr/frmt_jp2openjpeg.html">JP2OpenJPEG</a>, <a href="http://gdal.org/ogr/drv_mitab.html">MapInfo</a>, <a href="http://gdal.org/ogr/drv_pg.html">PG</a>, <br />
<a href="http://gdal.org/ogr/drv_sqlite.html">SQLite</a><a href="https://www.blogger.com/null">/Spatialite</a><br />
<br />
* Upgrade to EPSG 8.5 database<br />
<br />
* Fix locale related issues when formatting or reading floating point numbers<br />
<br />
<div style="text-align: justify;">
You can also find <a href="http://trac.osgeo.org/gdal/wiki/Release/2.0.0-News">more complete information</a> on the new features and fixes in the 2.0.0.</div>
<br />
The release can be downloaded from:<br />
* <a href="http://download.osgeo.org/gdal/2.0.0/gdal200.zip">http://download.osgeo.org/gdal/2.0.0/gdal200.zip</a> - source as a zip<br />
* <a href="http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.gz">http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.gz</a> - source as .tar.gz<br />
* <a href="http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.xz">http://download.osgeo.org/gdal/2.0.0/gdal-2.0.0.tar.xz</a> - source as .tar.xz<br />
* <a href="http://download.osgeo.org/gdal/2.0.0/gdalautotest-2.0.0.tar.gz">http://download.osgeo.org/gdal/2.0.0/gdalautotest-2.0.0.tar.gz</a> - test suite<br />
* <a href="http://download.osgeo.org/gdal/2.0.0/gdal200doc.zip">http://download.osgeo.org/gdal/2.0.0/gdal200doc.zip</a> - documentation/website<br />
<br />
<div style="text-align: justify;">
As there have been a few changes to the C++ API, and to a lesser extent to the C API, developers are strongly advised to read the <a href="https://svn.osgeo.org/gdal/branches/2.0/gdal/MIGRATION_GUIDE.TXT">migration guide</a>.</div>
Even Rouaulthttp://www.blogger.com/profile/13965870607935959853noreply@blogger.com1