I am seeing a growing buzz around the forthcoming 2016 Italian Cyber Crime Report, which will be officially released on March 15. This report, who is being previewed on several Italian online and paper magazines is “the work of a hundred of experts and a large number of public and private institutions who have shared with CLUSIT information, data, and field experience“.
CLUSIT is an Italian Association that reunites several local companies focused on Information Security, and has been very active since 2011 in compiling a yearly report that outlines the main security trends in information security from an Italian and International perspective.
I have co-authored this report until 2013. In particular I took care to write the section related to the cyber attacks targeting Italian entities and, most of all, exactly as I normally do for everybody who asks for it, I shared the raw data of my timelines with the authors of the reports to derive the statistics. Under the condition to quote the source of the data: my blog hackmageddon.com.
The report has progressively enriched its content with new insights, edition after edition, and my data has only been used for a single section, the one dedicated to the “Analysis of the Main known International Cyber Attacks”.
Starting from 2013, my job (and my wife 🙂 ) became increasingly challenging and time consuming for me, so I could not contribute actively to the report any longer, however I kept on sharing my data with CLUSIT to feed their statistics (as I do for everybody asking for it).
One day of October 2014 something happened: I stumbled upon the 2014 edition, and with great surprise and disappointment I discovered that my blog hackmageddon.com magically disappeared from the sources of the data used to derive the statistics, being replaced by a generic “OSINT source”: a clear violation of the conditions under which my data can be used.
As a consequence I wrote an email to the authors of the report, in which I did not authorize them any longer to use the data derived from my blog for the following editions of the CLUSIT report:
I will save some of your precious time, so here it goes an English translation for you!
Dear **** and **** I hope this message finds you well I write these few lines since I recently stumbled upon the 2014 Clusit Report and I just wanted to share some thoughts with you. I noticed you keep on using my data, however the source of my blog has disappeared, and I ended up being "merged" with other generic co-authors. My "merging" is more than legitimate, as my contribution has not been in line with the previous years, however, using my data without quoting the source is a clear violation of the rule the data is available with, rule that is clearly quoted in my blog: "Anyone intending to use the information contained in my posts is free to do so, provided my blog is mentioned in your article." As a consequence, I do not authorize you any longer to use my data for the next editions [of the report], whereas, if you decide to retain the data used for the past editions, you will have to quote the source, as done for the previous reports. Collecting this data is a really tough and time consuming job for me, so I must protect my work. Last but not least, it is really a shame that a similar occurrence only happened in Italy. My data has been used in many reports, but the source has always been quoted, even when used in combination with other sources. Thanks, Paolo.
I received back an email of apologies, which I won’t publish, indicating “a regrettable mistake” and the proposal to amend the report, however this proposal was not enough to repair the violation of my “license agreement” and give CLUSIT back the permission to use my data for the following editions of the reports (that is starting from 2015 onward). I believe my email is clear enough, isn’t it? Please notice in particular the following sentence:
I do not authorize you any longer to use my data for the next editions [of the report], whereas, if you decide to retain the data used for the past editions, you will have to quote the source, as done for the previous reports.
I hoped the story ended up here and we all lived happily ever after… But, reading the preview of the last report, and looking more in depth at the 2015 edition (covering the cyber attacks until 2014), I have a doubt that this matter is far from being closed, and the reality could be very different…
As I aforementioned, the 2016 report will be released on March 15, but a preview is already available (see for instance this link, I am sorry it’s in Italian). The title quotes: “CLUSIT Report 2016: Cyber Attacks experience a 30% increase in 2015”. Seriously? And you know why? The CLUSIT report quotes 1.012 known attacks recorded in 2015 vs 873 in 2014. Unfortunately none of the articles published so far quote the sources for these attacks (and even the 2015 edition does not explain where the database comes from for the data related to 2014). However the numbers of attacks are curiously similar to the numbers reported in the Hackmageddon statistics for 2015 and 2014.
Hackmageddon,com reported 1.017 attacks in 2015 and 880 in 2014 vs. respectively 1.012 and 873 reported by the CLUSIT Report.
The Hackmageddon data is available here and, oh dear… The numbers are soooo similar… I believe it’s a only coincidence deriving from the fact that we used the same sources (a quite curious occurrence, considering the number of attacks one could consider). A similarity that does not end up here, since I found in the same article the distribution of the motivations for 2014 and 2015, and again the data is quite similar to the one reported by Hackmageddon, most of all for 2015:
Of course this could mean that both of us did a good job collecting a similar sample of attacks (most of all in 2015) and also used similar rationals behind the classification. In any case it seems quite an odd coincidence that the numbers are so close. And if my sources are completely open, the same cannot be said for the CLUSIT database (at least in case of the 2015 edition) since the report does not explain how the database was built after 2013. Please bear in mind that before then, CLUSIT was using the Hackmageddon data: the 2012 and 2013 editions clearly show the reference to hackmageddon.com as the source of the database (as you will see shortly). The same reference that suddenly disappeared in the 2014 edition.
I can’t tell more at this point since I have not seen the report, in any case I have a few questions for my colleagues at CLUSIT:
1. I would really appreciate if you could be so kind to exchange the favor I did for you in the past and make available the raw data used for the statistics related to 2014 and 2015, used to compile the 2016 edition of the report.
Despite I did not authorize CLUSIT to use my data, the 2015 report has been redacted used the Hackmageddon data for the 2011-2013 period.
2. Despite I did not authorize CLUSIT to use my data, the 2015 report has been redacted used the Hackmageddon data for the 2011-2013 period. The tables of the database of attacks for the editions of 2014 (page 15) and 2015 (page 21) show the same value for the 2011-2013 period. This sample was built using the Hackmageddon data as clearly indicated in the editions of 2012 (footnote 18, page 8) and 2013 (footnotes 4 and 5, page 8). Since the total number of attacks in 2011 and 2012 is the same throughout all the reports (see pictures below), this means that the authors have used my 2011-2013 data without any right for the 2015 edition, as the authorization to do so was removed as per my email shown above. At this point can you please provide me with an explanation of the reason why you did it?
Report 2012 (data related to 2011 and 1Q 2012)
Report 2013 (data related to 2012)
Report 2014 (data related to 2013)
Report 2014 (data related to 2014)
3. I kindly request to publish an amendment for the 2015 edition, removing my data, and do the same for the 2016 edition in case you still used my data in this report for the 2011-2013 period.
Unfortunately I won’t be able to attend the official presentation, but I am keen to receive a satisfactory response from CLUSIT.
Last but not least, let me also add that, should I discover a continual violation of my data usage policy, I will use any mean to protect my work. My data is completely open and available for anyone, and all of you guys know how hard is to build the timelines and keep up with the publication schedule (and you probably noticed that I am tremendously late for the data of February). All this stuff is seriously convincing me that it’s time to re-consider my open access policy to the data.