Marco Castelluccio
A tool to find correlations in crash reports, recently integrated in Socorro (https://crash-stats.mozilla.org/).
Count occurrences of every possible attribute-value pair (e.g. platform=Windows) for each signature
Most SuperSearch (discrete) fields as considered as attributes, some are generated operating on those fields or on fields from the JSON dump.
def get_arch(total_virtual_memory, platform, platform_version):
if total_virtual_memory:
if int(total_virtual_memory) < 2684354560:
return 'x86'
elif int(total_virtual_memory) > 2684354560:
return 'amd64'
elif platform == 'Mac OS X':
return 'amd64'
else:
if 'i686' in platform_version:
return 'x86'
elif 'x86_64' in platform_version:
return 'amd64'
Exclude attribute-value pairs that have low percentages (< 15%)
Generate possible two-level candidates (e.g. platform=Windows && gfx_vendor=NVIDIA)
Filter out results:
Many attributes are strongly dependent from others. For example, the presence of a given DLL might be directly linked to a particular Windows version.
The algorithm takes that into account by defining a graph of dependencies. When a dependency is found, the percentage of occurrence is recalculated taking the dependency into account.
(83.90% in signature vs 33.91% overall) Module "bcryptPrimitives.dll"
(100.0% in signature vs 98.44% overall) Module "bcryptPrimitives.dll" if platform_version = 10.0.14393