A Few Questions for the Authors of the 2016 Italian Cyber Crime Report

I am seeing a growing buzz around the forthcoming 2016 Italian Cyber Crime Report, which will be officially released on March 15. This report, who is being previewed on several Italian online and paper magazines is “the work of a hundred of experts and a large number of public and private institutions who have shared with CLUSIT information, data, and field experience“.

CLUSIT is an Italian Association that reunites several local companies focused on Information Security, and has been very active since 2011 in compiling a yearly report that outlines the main security trends in information security from an Italian and International perspective.

I have co-authored this report until 2013. In particular I took care to write the section related to the cyber attacks targeting Italian entities and, most of all, exactly as I normally do for everybody who asks for it, I shared the raw data of my timelines with the authors of the reports to derive the statistics. Under the condition to quote the source of the data: my blog hackmageddon.com.

The report has progressively enriched its content with new insights, edition after edition, and my data has only been used for a single section, the one dedicated to the “Analysis of the Main known International Cyber Attacks”.

Starting from 2013, my job (and my wife 🙂 ) became increasingly challenging and time consuming for me, so I could not contribute actively to the report any longer, however I kept on sharing my data with CLUSIT to feed their statistics (as I do for everybody asking for it).

One day of October 2014 something happened: I stumbled upon the 2014 edition, and with great surprise and disappointment I discovered that my blog hackmageddon.com magically disappeared from the sources of the data used to derive the statistics, being replaced by a generic “OSINT source”: a clear violation of the conditions under which my data can be used.

As a consequence I wrote an email to the authors of the report, in which I did not authorize them any longer to use the data derived from my blog for the following editions of the CLUSIT report:

Mail

I will save some of your precious time, so here it goes an English translation for you!

Dear **** and ****

I hope this message finds you well
I write these few lines since I recently stumbled upon the 2014 Clusit Report and I just wanted to share some thoughts with you.

I noticed you keep on using my data, however the source of my blog has disappeared, and I ended up being "merged" with other generic co-authors.
My "merging" is more than legitimate, as my contribution has not been in line with the previous years, however, using my data without quoting the source is a clear violation of the rule the data is available with, rule that is clearly quoted in my blog:

"Anyone intending to use the information contained in my posts is free to do so, provided my blog is mentioned in your article."

As a consequence, I do not authorize you any longer to use my data for the next editions [of the report], whereas, if you decide to retain the data used for the past editions, you will have to quote the source, as done for the previous reports.

Collecting this data is a really tough and time consuming job for me, so I must protect my work. 
Last but not least, it is really a shame that a similar occurrence only happened in Italy. My data has been used in many reports, but the source has always been quoted, even when used in combination with other sources.

Thanks, 

Paolo.

I received back an email of apologies, which I won’t publish, indicating “a regrettable mistake” and the proposal to amend the report, however this proposal was not enough to repair the violation of my “license agreement” and  give CLUSIT back the permission to use my data for the following editions of the reports (that is starting from 2015 onward). I believe my email is clear enough, isn’t it? Please notice in particular the following sentence:

I do not authorize you any longer to use my data for the next editions [of the report], whereas, if you decide to retain the data used for the past editions, you will have to quote the source, as done for the previous reports.

I hoped the story ended up here and we all lived happily ever after… But, reading the preview of the last report, and looking more in depth at the 2015 edition (covering the cyber attacks until 2014), I have a doubt that this matter is far from being closed, and the reality could be very different…

As I aforementioned, the 2016 report will be released on March 15, but a preview is already available (see for instance this link, I am sorry it’s in Italian). The title quotes: “CLUSIT Report 2016: Cyber Attacks experience a 30% increase in 2015”. Seriously? And you know why? The CLUSIT report quotes 1.012 known attacks recorded in 2015 vs 873 in 2014. Unfortunately none of the articles published so far quote the sources for these attacks (and even the 2015 edition does not explain where the database comes from for the data related to 2014). However the numbers of attacks are curiously similar to the numbers reported in the Hackmageddon statistics for 2015 and 2014.

Hackmageddon,com reported 1.017 attacks in 2015 and 880 in 2014 vs. respectively 1.012 and 873 reported by the CLUSIT Report.

The Hackmageddon data is available here and, oh dear… The numbers are soooo similar… I believe it’s a only coincidence deriving from the fact that we used the same sources (a quite curious occurrence, considering the number of attacks one could consider). A similarity that does not end up here, since I found in the same article the distribution of the motivations for 2014 and 2015, and again the data is quite similar to the one reported by Hackmageddon, most of all for 2015:

cybersecurity1

Motivations 2014 vs 2015

2014 2015
Motivation Hackmageddon Clusit Hackmageddon Clusit
Cyber Crime 62,3% 60% 67% 68%
Hacktivism 24.9% 27% 20.8% 21%
Espionage 10.2% 8% 9.8% 9%
Cyber War 2.5% 5% 2.4% 2%

Of course this could mean that both of us did a good job collecting a similar sample of attacks (most of all in 2015) and also used similar rationals behind the classification. In any case it seems quite an odd coincidence that the numbers are so close. And if my sources are completely open, the same cannot be said for the CLUSIT database (at least in case of the 2015 edition) since the report does not explain how the database was built after 2013. Please bear in mind that before then, CLUSIT was using the Hackmageddon data: the 2012 and 2013 editions clearly show the reference to hackmageddon.com as the source of the database (as you will see shortly). The same reference that suddenly disappeared in the 2014 edition.

I can’t tell more at this point since I have not seen the report, in any case I have a few questions for my colleagues at CLUSIT:

1. I would really appreciate if you could be so kind to exchange the favor I did for you in the past and make available the raw data used for the statistics related to 2014 and 2015, used to compile the 2016 edition of the report.

Despite I did not authorize CLUSIT to use my data, the 2015 report has been redacted used the Hackmageddon data for the 2011-2013 period.

2. Despite I did not authorize CLUSIT to use my data, the 2015 report has been redacted used the Hackmageddon data for the 2011-2013 period. The tables of the database of attacks for the editions of 2014 (page 15) and 2015 (page 21) show the same value for the 2011-2013 period. This sample was built using the Hackmageddon data as clearly indicated in the editions of 2012 (footnote 18, page 8) and 2013 (footnotes 4 and 5, page 8). Since the total number of attacks in 2011 and 2012 is the same throughout all the reports (see pictures below), this means that the authors have used my 2011-2013 data without any right for the 2015 edition, as the authorization to do so was removed as per my email shown above. At this point can you please provide me with an explanation of the reason why you did it?

Report 2012 (data related to 2011 and 1Q 2012)

Report 2011

Report 2013 (data related to 2012)

Report 2012

Report 2014 (data related to 2013)

Report 2014

Report 2014 (data related to 2014)

Report 2015

3. I kindly request to publish an amendment for the 2015 edition, removing my data, and do the same for the 2016 edition in case you still used my data in this report for the 2011-2013 period.

Unfortunately I won’t be able to attend the official presentation, but I am keen to receive a satisfactory response from CLUSIT.

Last but not least, let me also add that, should I discover a continual violation of my data usage policy, I will use any mean to protect my work. My data is completely open and available for anyone, and all of you guys know how hard is to build the timelines and keep up with the publication schedule (and you probably noticed that I am tremendously late for the data of February). All this stuff is seriously convincing me that it’s time to re-consider my open access policy to the data.

3 thoughts on “A Few Questions for the Authors of the 2016 Italian Cyber Crime Report

  • March 9, 2016 at 9:08 am
    Permalink

    It’s extremely hard work Paolo. It’s unethical to use your data such way.

    Reply
  • March 10, 2016 at 4:31 am
    Permalink

    Caro Paolo,
    ci spiace e ci ha sorpreso molto ciò che hai scritto: se ritieni di aver ricevuto torto da noi, perché non contattarci per un chiarimento diretto, che avrebbe risolto immediatamente la questione? Non ti è certo estraneo il concetto di “responsible disclosure”!

    Desideriamo subito rassicurarti sul fatto che i dati che presentiamo nel nuovo Rapporto non tengono assolutamente conto della tua base dati e delle tue analisi in merito. La ricerca Clusit è originale sotto il profilo delle fonti aperte analizzate, dei criteri di classificazione degli incidenti applicati e dei risultati delle analisi svolte. Trovi nel seguito la e le che forniscono i chiarimenti necessari.

    Per quanto riguarda la citazione, in questo come nei precedenti Rapporti, di dati elaborati *anche* con il tuo contributo negli anni in cui eri presente, in effetti noi citiamo semplicemente dei macrodati presi dai nostri stessi rapporti. Sarebbe ingeneroso distinguere i contributi individuali (anche se solo come fonte) da quello che, come hai avuto esperienza, è un lavoro collettivo del quale le statistiche costituiscono per altro una minima parte. E’ infatti sull’interpretazione dei trend, sull’analisi degli incidenti più gravi, sulla comprensione delle cause e degli effetti dei fenomeni che osserviamo, che il Rapporto basa una parte del suo successo.

    La nostra Associazione ha come principale obiettivo promuovere e diffondere la consapevolezza della sicurezza delle informazioni. Il rispetto delle regole e dei principi etici costituiscono un fondamento per l’ambito professionale che ci accomuna. Ne è la dimostrazione, come hai dimenticato di citare, che l’increscioso errore commesso per l’edizione del Rapporto 2014 (conseguenza di un problema tecnico legato all’impaginatura della versione cartacea) è stato subito corretto sia nelle versioni cartacee che elettroniche del Rapporto.

    Sperando di aver chiarito le questioni che sollevi, lascia che siamo noi a farti, amichevolmente, alcune proposte:
    – perchè non tornare a collaborare assieme per i prossimi rapporti nell’elaborazione delle informazioni? Scoprirai interessanti cambiamenti nel modo con cui valutiamo gli incidenti rilevanti, nonchè nuovi collaboratori che si sono aggiunti al nostro team, assieme ai quali potrai apportare anche tu un prezioso contributo da esperto del settore quale sei;
    – perchè non prevedere un confronto sui dati che divergono, a nostro vedere i casi più frequenti, per migliorare i risultati di entrambi?

    Il Rapporto Clusit è il prodotto del lavoro di tanti collaboratori che, come tu stesso in passato, si sforzano di dare il meglio per fornire agli addetti ai lavori, e non solo, un quadro il più possibile realistico dello scenario di sicurezza, trattando anche temi di particolare rilevanza per la sicurezza ICT delle imprese Italiane. I dati globali del Cybercrime si accompagnano a statistiche italiane e dati ottenuti in esclusiva da soggetti privati e pubblici, forze dell’ordine, imprese leader nel settore, che ne avvalorano l’autorevolezza e offrono a tutti, anche a te se vuoi, l’occasione di discutere di quei problemi a cui tutti siamo particolarmente (per passione, per professione) sensibili.

    Ti aspettiamo,

    Il Direttivo Clusit

    Riportiamo il dettaglio della metodologia utilizzata per la produzione dei dati del Rapporto Clusit relativo al Cybercrime.

    Per effettuare l’analisi svolta annualmente da Clusit in merito ai più gravi attacchi informatici di dominio pubblico a livello globale vengono utilizzati diversi tipi di fonti aperte, che si possono suddividere in 5 categorie:
    – oltre 100 siti e blog specializzati in ICT Security / Cyber Security (vedi in allegato l’elenco dei siti più utilizzati);
    – i report trimestrali / annuali dei principali Vendor di sicurezza;
    – svariati siti “mainstream” (p.es. corriere, repubblica, ansa, bloomberg, bbc, cnn, nytimes, lastampa, wired, theregister etc etc);
    – circa 200 account Twitter di aziende, enti e singoli esperti che trattano di Cyber Security;
    – la nota newsletter “Dragon” di Team Cymru.

    La realizzazione dell’analisi Clusit si svolge in 4 fasi:
    1. Consultando periodicamente le fonti sopra citate e raccogliendo informazioni sugli incidenti riportati (OSInt);
    2. Selezionando, in base a criteri di “gravità” definiti empiricamente da Clusit, quali attacchi inserire nella ricerca e quali scartare perché non rilevanti ai fini della ricerca;
    3. Analizzando i dati disponibili in merito ad ogni attacco in modo da categorizzare attaccanti, vittime, tecniche usate e distribuzione geografica delle vittime, in base a criteri definiti empiricamente da Clusit;
    4. Predisponendo il report per il Rapporto Clusit, con i relativi grafici e l’analisi dei fenomeni che emergono dai dati.

    Dalla prima fase di ricerca su Web si raccolgono mediamente circa 100-110 incidenti al mese, che poi una volta scremati diventano (a seconda dei mesi) mediamente 70-90. Si sottolinea che il numero di attacchi pubblicati dalle fonti aperte utilizzate è determinato anche dalle loro esigenze editoriali, e quindi il numero complessivo di incidenti rilevabili in un dato lasso di tempo (p.es. 1 mese) a partire da un pool di fonti di questo tipo è in parte limitato dalla capacità e dalla volontà delle fonti stesse di pubblicare notizie di attacchi.

    Elenco delle prime 100 fonti aperte (selezionate per frequenza) utilizzate da Clusit per realizzare la propria ricerca nel 2015. I siti marcati con asterisco sono quelli che ricorrono più spesso.

    * arstechnica.com
    * blog.kaspersky.com
    * blog.team-cymru.org
    * cyberwarzone.com
    * darkreading.com
    * infosecisland.com
    * krebsonsecurity.com
    * nakedsecurity.sophos.com
    * scmagazine.com
    * securelist.com
    * securityaffairs.co
    * securityweek.com
    * thehackernews.com
    * threatpost.com
    * tripwire.com/state-of-security
    anon-news.blogspot.it
    anonhq.com
    bankinfosecurity.com
    blog.cloudflare.com
    blog.detectify.com
    blog.edgewave.com
    blog.emsisoft.com
    blog.fortinet.com
    blog.fox-it.com
    blog.imperva.com
    blog.ioactive.com
    blog.malwarebytes.org
    blog.malwaremustdie.org
    blog.norsecorp.com (NB non più attivo)
    blog.spiderlabs.com
    blog.sucuri.net
    blog.trendmicro.com
    blogs.akamai.com/security
    blogs.cisco.com
    blogs.mcafee.com
    blogs.sophos.com
    cbronline.com
    cert.pl
    cio.com
    computerworld.com
    cryptome.org
    cscss.org
    csoonline.com
    cyberreconnaissance.com
    cybersecurityindex.org
    darknet.org.uk
    databreachtoday.com
    ehackingnews.com
    exchange.xforce.ibmcloud.com
    f-secure.com/weblog
    fierceitsecurity.com
    fireeye.com/blog.html
    fossbytes.com
    govinfosecurity.com
    group-ib.com
    hackersnewsbulletin.com
    hackread.com
    htbridge.com/blog
    i-hls.com
    ic3.gov
    ics-cert.us-cert.gov
    incapsula.com/blog
    inforisktoday.com
    infosecurity-magazine.com
    isc.sans.edu
    isightpartners.com
    itgovernance.co.uk
    itgovernanceusa.com
    itpro.co.uk
    krypt3ia.wordpress.com
    labs.bitdefender.com/blog
    labs.bromium.com
    latesthackingnews.com
    malwaretech.com
    mobile.europol.europa.eu
    motherboard.vice.com
    net-security.org
    networkworld.com
    news.softpedia.com
    news.techworld.com
    pastebin.com
    reddit.com
    resources.infosecinstitute.com
    riskiq.com
    rss.voidsec.com
    scmagazineus.com
    seculert.com
    secureworks.com
    securityintelligence.com
    securityledger.com
    shodanio.wordpress.com
    streetinsider.com
    technologyreview.com
    thecybersecurityexpert.com
    theregister.co.uk
    us-cert.gov
    v3.co.uk
    venafi.com/blog
    vulnerabilitycenter.com
    welivesecurity.com
    ______________________________

    Reply
  • Pingback: Three Days… And Still No Answers – HACKMAGEDDON

Leave a Reply

%d bloggers like this: