Library: switch to 64bit implementation for hash-chaining (see #722)

⚠ __This is a problematic decision__
It temporarily **breaks compatibility with 32bit** until this issue is resolved.

== Explanation ==
Lumiera relies on a mix of the Standard library and Lib-Boost for calculation of hash values.
Before C++11, the Standard did not support and hashtable implementation; meanwhile, we
got several hash based containers in the STL and a framework for hashes,
which unfortunately is incomplete and cumbersome to use.

The C++ Committee has spend endless discussions and was not able to settle
on a convincing solution without major drawbacks regarding one aspect or the other.

This situation is problematic, since Lumiera relies heavily on the technique
of building stable systematic identifiers based on chained hash values.
It is thus essential to use a strong, reliable and portable hash function.

But unfortunately...
 * the standard-fallback solution is known to be weak.
 * Lib-Boost automatically uses stronger implementations for 64bit systems
 * this implies that Hash-Values **are non-portable**

As the Lumiera project currently has no developer time to expend on such a
difficult and deep topic of fundamental research, today I decided to go down
the path of least resistance and **effectively abandon any system
that can not compile and use the 64bit `hash_combine` implementation.

This changeset extracts code from Lib-Boost 1.67 and adds a static assertion
to **break compilation** on non-64bit-platforms (whatever this means)
This commit is contained in:
Fischlurch 2024-11-17 22:40:47 +01:00
parent a20e233ca0
commit e618493829
7 changed files with 200 additions and 66 deletions

View file

@ -75,11 +75,11 @@
#define LUMIERA_QUERY_H
#include "lib/hash-combine.hpp"
#include "lib/typed-counter.hpp"
#include "lib/iter-adapter.hpp"
#include "lib/query-text.hpp"
#include "lib/query-util.hpp"
#include "lib/hash-value.h"
#include "lib/nocopy.hpp"
#include "lib/symbol.hpp"
#include "lib/util.hpp"

108
src/lib/hash-combine.hpp Normal file
View file

@ -0,0 +1,108 @@
/*
HASH-COMBINE.hpp - hash chaining function extracted from LibBoost
Copyright (C)
2012, Hermann Vosseler <Ichthyostega@web.de>
  **Lumiera** is free software; you can redistribute it and/or modify it
  under the terms of the GNU General Public License as published by the
  Free Software Foundation; either version 2 of the License, or (at your
  option) any later version. See the file COPYING for further details.
======================================================================
NOTE: this header adapts implementation code from LibBoost 1.67
// Copyright 2005-2014 Daniel James.
// Distributed under the Boost Software License, Version 1.0.
// (See http://www.boost.org/LICENSE_1_0.txt)
//
// Based on Peter Dimov's proposal
// http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf
// issue 6.18.
//
// This also contains public domain code from MurmurHash. From the MurmurHash header:
//
// MurmurHash3 was written by Austin Appleby, and is placed in the public
// domain. The author hereby disclaims copyright to this source code.
======================================================================
*/
/** @file hash-combine.h
** Hash combine function extracted from LibBoost 1.67
** Combine two hash values to form a composite depending on both.
** @todo 2024 the Lumiera project has yet to decide how to approach
** portability of hash values, and the related performance issues.
** This code was directly integrated into the code base to ensure
** a stable implementation and reproducible hash values.
** ///////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 uniform uses of hash values
*/
#ifndef LIB_HASH_COMBINE_H
#define LIB_HASH_COMBINE_H
#include "lib/hash-value.h"
#include "lib/integral.hpp"
#include <climits>
namespace lib {
namespace hash{
/** meld the additional hash value into the given
* base hash value. This is the standard formula
* used by Lib-Boost to combine the hash values
* of parts into a composite.
*/
inline void
combine (size_t & combinedHash, size_t additionalHash)
{
#if false //////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 : Decide what stance to take towards portability
combinedHash ^= additionalHash
+ 0x9e3779b9
+ (combinedHash<<6)
+ (combinedHash>>2);
}
#endif /////////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 : (End) weak but portable fall-back code
/////////////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 : Using the stronger Boost-impl for 64bit platforms
//// see: Boost 1.67 <include>/boost/container_hash/hash.hpp
///
/*
// Don't define 64-bit hash combine on platforms without 64 bit integers,
// and also not for 32-bit gcc as it warns about the 64-bit constant.
#if !defined(BOOST_NO_INT64_T) && \
!(defined(__GNUC__) && ULONG_MAX == 0xffffffff)
inline void hash_combine_impl(boost::uint64_t& h,
boost::uint64_t k)
{
*/
static_assert (sizeof (void*) * CHAR_BIT == 64, "TODO 2024 : decide what to do about portability");
static_assert (sizeof (size_t) == sizeof(uint64_t));
uint64_t& h = combinedHash;
uint64_t k = additionalHash;
const uint64_t m{0xc6a4a7935bd1e995};
const int r = 47;
k *= m;
k ^= k >> r;
k *= m;
h ^= k;
h *= m;
// Completely arbitrary number, to prevent 0's
// from hashing to 0.
h += 0xe6546b64;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 : (End) Code extracted from Boost 1.67
//
//
//
}} // namespace lib::hash
#endif /*LIB_HASH_COMBINE_H*/

View file

@ -26,6 +26,10 @@
** This header defines the basic hash value types and provides some simple
** utilities to support working with hash values.
**
** @todo 11/2024 : to ensure a strong and reproducible implementation of hash-chaining,
** the implementation of LibBoost is used directly. This breaks portability.
** ///////////////////////////////////////////////////////////////////////////////TICKET #722
** @see hash-combine.hpp
** @see HashIndexed
**
*/
@ -49,6 +53,7 @@ typedef lumiera_uid* LumieraUid;
#ifdef __cplusplus /* =========== C++ definitions ====================== */
#include <climits>
namespace lib {
@ -59,57 +64,8 @@ namespace lib {
typedef lumiera_uid* LUID;
namespace hash {
/** meld the additional hash value into the given
* base hash value. This is the standard formula
* used by the STL and Boost to combine the
* hash values of parts into a composite.
*/
inline void
combine (size_t & combinedHash, size_t additionalHash)
{
combinedHash ^= additionalHash
+ 0x9e3779b9
+ (combinedHash<<6)
+ (combinedHash>>2);
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////TICKET #722 : Boost uses a stronger impl here on 64bit platforms
/// see: Boost 1.67 <include>/boost/container_has/hash.hpp
//////////////////////////////////////TICKET #722 : hash_combine utility extracted into separate header 11/2024
///
/*
// Don't define 64-bit hash combine on platforms without 64 bit integers,
// and also not for 32-bit gcc as it warns about the 64-bit constant.
#if !defined(BOOST_NO_INT64_T) && \
!(defined(__GNUC__) && ULONG_MAX == 0xffffffff)
inline void hash_combine_impl(boost::uint64_t& h,
boost::uint64_t k)
{
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
const int r = 47;
k *= m;
k ^= k >> r;
k *= m;
h ^= k;
h *= m;
// Completely arbitrary number, to prevent 0's
// from hashing to 0.
h += 0xe6546b64;
}
#endif // BOOST_NO_INT64_T
*/
//
// WIP more utils to come here....
}
} // namespace lib
#endif /* C++ */
#endif /*LIB_HASH_UTIL_H*/

View file

@ -32,6 +32,7 @@
#include "steam/engine/job-ticket.hpp"
#include "vault/gear/nop-job-functor.hpp"
#include "lib/hash-combine.hpp"
#include "lib/depend.hpp"
#include "lib/util.hpp"

View file

@ -37,8 +37,9 @@
#include "lib/hash-standard.hpp"
#include "vault/gear/job.h"
#include "lib/hash-combine.hpp"
#include "lib/time/timevalue.hpp"
#include "vault/gear/job.h"
#include <string>

View file

@ -44,8 +44,8 @@
#include "lib/test/test-helper.hpp"
#include "lib/time/timevalue.hpp"
#include "vault/real-clock.hpp"
#include "lib/hash-combine.hpp"
#include "lib/null-value.hpp"
#include "lib/hash-value.h"
#include "lib/depend.hpp"
#include "lib/util.hpp"

View file

@ -56608,7 +56608,24 @@
</ul>
</body>
</html></richcontent>
<node BACKGROUND_COLOR="#f8f1cb" COLOR="#a50125" CREATED="1701698621411" ID="ID_1690934543" MODIFIED="1701698648712" TEXT="Vorsicht: lib::hash::combine ist schwach">
</node>
<node CREATED="1701698191775" ID="ID_1333941621" MODIFIED="1701698305483" TEXT="wir haben unsere LUID, die ist 128bit">
<icon BUILTIN="bell"/>
<node CREATED="1701698271332" ID="ID_615812427" MODIFIED="1701698278751" TEXT="Performance-Implikationen nicht klar"/>
<node CREATED="1701698280027" ID="ID_786194248" MODIFIED="1701698299569" TEXT="es gibt hierf&#xfc;r keinerlei Framework (Hasher, combiner, Container-Adapter)"/>
</node>
<node BACKGROUND_COLOR="#e0ceaa" COLOR="#690f14" CREATED="1701698215387" ID="ID_1138411401" MODIFIED="1731978172081" TEXT="Hash-Values sind plattformabh&#xe4;ngig (Problem: Tests)">
<richcontent TYPE="NOTE"><html>
<head/>
<body>
<p>
de-facto wird nur noch 64-bit entwickelt. Inzwischen habe ich einige Tests, bei denen ich Hash-Values direkt pr&#252;fe. Die w&#252;rden auf 32bit alle brechen
</p>
</body>
</html></richcontent>
<icon BUILTIN="messagebox_warning"/>
</node>
<node BACKGROUND_COLOR="#f8f1cb" COLOR="#a50125" CREATED="1701698621411" ID="ID_1690934543" MODIFIED="1731978172076" TEXT="Vorsicht: lib::hash::combine ist schwach">
<icon BUILTIN="messagebox_warning"/>
<node CREATED="1701698649934" ID="ID_1402802007" MODIFIED="1701698665460" TEXT="das ist die Fallback-Implementierung aus boost::hash"/>
<node CREATED="1701698666056" ID="ID_1059722984" MODIFIED="1701698723192" TEXT="boost::hash verwendet aber auf 64bit eine st&#xe4;rkere Impl">
@ -56621,23 +56638,53 @@
</body>
</html></richcontent>
</node>
</node>
</node>
<node CREATED="1701698191775" ID="ID_1333941621" MODIFIED="1701698305483" TEXT="wir haben unsere LUID, die ist 128bit">
<icon BUILTIN="bell"/>
<node CREATED="1701698271332" ID="ID_615812427" MODIFIED="1701698278751" TEXT="Performance-Implikationen nicht klar"/>
<node CREATED="1701698280027" ID="ID_786194248" MODIFIED="1701698299569" TEXT="es gibt hierf&#xfc;r keinerlei Framework (Hasher, compbiner, Container-Adapter)"/>
</node>
<node BACKGROUND_COLOR="#e0ceaa" COLOR="#690f14" CREATED="1701698215387" ID="ID_1138411401" MODIFIED="1701698269652" TEXT="Hash-Values sind plattformabh&#xe4;ngig (Problem: Tests)">
<node BACKGROUND_COLOR="#e0ceaa" COLOR="#690f14" CREATED="1731973136024" ID="ID_390782277" MODIFIED="1731978413422" TEXT="w&#xfc;nschenswert : stabile L&#xf6;sung f&#xfc;r Lumiera">
<icon BUILTIN="yes"/>
<node CREATED="1731973156734" ID="ID_417850384" MODIFIED="1731973171130" TEXT="also lib::hash::combine() verwenden"/>
<node COLOR="#5b280f" CREATED="1731973171834" ID="ID_163385" MODIFIED="1731973359905" TEXT="leider ist diese Funktion naiv (und damit schwach) implementiert">
<richcontent TYPE="NOTE"><html>
<head/>
<body>
<p>
de-facto wird nur noch 64-bit entwickelt. Inzwischen habe ich einige Tests, bei denen ich Hash-Values direkt pr&#252;fe. Die w&#252;rden auf 32bit alle brechen
...ich hatte damals die g&#228;ngige Implementierung genommen (wohl aus Boost, aber nicht genau genug geschaut); tats&#228;chlich hat Boost (mindestens) zwei optimierte Varianten, und es ist undurchsichtig, welche wann genommen wird. Die Boost-Doku warnt aber eigens, da&#223; Hash-Werte weder stabil noch portabel sind.
</p>
</body>
</html></richcontent>
<icon BUILTIN="messagebox_warning"/>
<arrowlink DESTINATION="ID_1262455046" ENDARROW="Default" ENDINCLINATION="64;-1;" ID="Arrow_ID_316550933" STARTARROW="None" STARTINCLINATION="73;0;"/>
<icon BUILTIN="stop-sign"/>
</node>
<node COLOR="#435e98" CREATED="1731973210278" ID="ID_1845999791" MODIFIED="1731978401215" TEXT="die meisten Stellen verwenden boost-Hash">
<icon BUILTIN="info"/>
<node CREATED="1731973229348" ID="ID_1388359166" MODIFIED="1731973234543" TEXT="und sind damit schnell + gut"/>
<node BACKGROUND_COLOR="#e0ceaa" COLOR="#690f14" CREATED="1731973234992" ID="ID_1262455046" MODIFIED="1731973359905" TEXT="aber nicht portabel">
<linktarget COLOR="#a9b4c1" DESTINATION="ID_1262455046" ENDARROW="Default" ENDINCLINATION="64;-1;" ID="Arrow_ID_316550933" SOURCE="ID_163385" STARTARROW="None" STARTINCLINATION="73;0;"/>
<icon BUILTIN="broken-line"/>
</node>
</node>
<node BACKGROUND_COLOR="#ccb59b" COLOR="#6e2a38" CREATED="1731973383487" ID="ID_686530981" MODIFIED="1731978382840" TEXT="pragmatischer Weg: vorerst effektiv auf 64bit einschr&#xe4;nken">
<linktarget COLOR="#911843" DESTINATION="ID_686530981" ENDARROW="Default" ENDINCLINATION="269;545;" ID="Arrow_ID_1170062700" SOURCE="ID_310342175" STARTARROW="None" STARTINCLINATION="355;-670;"/>
<font ITALIC="true" NAME="SansSerif" SIZE="14"/>
<icon BUILTIN="yes"/>
<node CREATED="1731973407210" ID="ID_1673261044" MODIFIED="1731973435962" TEXT="dabei l&#xe4;uft die aktuelle Situation sowiso darauf hinaus..."/>
<node CREATED="1731973436964" ID="ID_633739202" MODIFIED="1731973675195" TEXT="und sei es aus Mangel an &#xdc;berschu&#xdf;">
<richcontent TYPE="NOTE"><html>
<head/>
<body>
<p>
ich versuche schon seit vielen Jahren, Umwege abzuk&#252;rzen, ohne zu viel Schaden anzuwenden. Portabilit&#228;t, Plattform-Testing, Releases, Aufbereitung der Dokumentation, das Buildsystem, Vollst&#228;ndigere Tests ...
</p>
<p>
Lieber opfere ich Dinge, die als &#187;Wert&#171; gelten, und beschr&#228;nke mich auf den Kern meiner Vision
</p>
</body>
</html></richcontent>
</node>
<node CREATED="1731973676160" ID="ID_1337884217" MODIFIED="1731973702696" TEXT="wenn schon &#x2014; dann besser klar mit Fehlermeldung"/>
<node BACKGROUND_COLOR="#e0ceaa" COLOR="#690f14" CREATED="1731973713779" ID="ID_1290416875" MODIFIED="1731973728281" TEXT="derzeit ist die Situation noch so g&#xfc;nstig wie nie">
<icon BUILTIN="ksmiletris"/>
</node>
</node>
</node>
</node>
</node>
<node BACKGROUND_COLOR="#d2beaf" COLOR="#5c4d6e" CREATED="1701698319632" ID="ID_1255956298" MODIFIED="1701698326482" TEXT="Bedeutung der Hash-Values">
@ -56968,7 +57015,28 @@
</node>
<node BACKGROUND_COLOR="#eee5c3" COLOR="#990000" CREATED="1728786962056" ID="ID_231763184" MODIFIED="1728787040943" TEXT="Verifikations-Methoden">
<icon BUILTIN="flag-yellow"/>
<node CREATED="1728786973094" ID="ID_899991782" MODIFIED="1728786978286" TEXT="Datenpr&#xfc;fsumme"/>
<node CREATED="1728786973094" ID="ID_899991782" MODIFIED="1728786978286" TEXT="Datenpr&#xfc;fsumme">
<node BACKGROUND_COLOR="#eee5c3" COLOR="#990000" CREATED="1731947548990" ID="ID_589371487" MODIFIED="1731947561331" TEXT="berechnen">
<icon BUILTIN="flag-yellow"/>
<node CREATED="1731972983573" ID="ID_690852884" MODIFIED="1731972994119" TEXT="Technologie : Hash-Verkettung">
<node BACKGROUND_COLOR="#f8f1cb" COLOR="#a50125" CREATED="1731972995891" ID="ID_591783472" LINK="https://issues.lumiera.org/ticket/722" MODIFIED="1731973048277" TEXT="#722 : total verworrene Situation">
<icon BUILTIN="messagebox_warning"/>
</node>
<node BACKGROUND_COLOR="#fafe99" COLOR="#fa002a" CREATED="1731978226350" ID="ID_310342175" MODIFIED="1731978382840">
<richcontent TYPE="NODE"><html>
<head/>
<body>
<p>
Entscheidung: verwende ab jetzt die <b>64bit-Implementierung aus Boost</b>
</p>
</body>
</html></richcontent>
<arrowlink COLOR="#911843" DESTINATION="ID_686530981" ENDARROW="Default" ENDINCLINATION="269;545;" ID="Arrow_ID_1170062700" STARTARROW="None" STARTINCLINATION="355;-670;"/>
<icon BUILTIN="clanbomber"/>
</node>
</node>
</node>
</node>
<node CREATED="1728786979373" ID="ID_772580400" MODIFIED="1728786987393" TEXT="Metadaten-Erkennung"/>
<node CREATED="1728786988308" ID="ID_103079985" MODIFIED="1728786992576" TEXT="Pr&#xe4;dikate">
<node CREATED="1728779817892" ID="ID_882765451" MODIFIED="1728779971755" TEXT="isSane">