On Wed, 2010-11-24 at 10:18 +0100, Mathieu Goeminne wrote: > Hello, > > Thanks for your very precious and rapid feedback. I talked to my PhD > supervisor, Professor Tom Mens about it (head of the software engineering > group of the university, in CC of this mail), and he proposed to sign a > non-disclosure agreement in which we promise not to make available the > information about the physical persons involved (or the identities and > logins they use). > For our research purposes, the results we will produce will not contain any > personal information.Essentially, our results will be primarily numerical > and visual results that will be analysed statistically. We will ensure to > respect any privacy policy that will be imposed. > > Concerning your second remark about git.gnome.org, we already use the > guidelines you suggest, but they are not sufficient for our analysis > purposes, since we still find quite a number of false positives and false > negatives during our data analysis, and moreover this data does not contain > information about identities used by the maillers and bug trackers. What do you mean for false positives and false negatives? (Not in the statistical definition, in the samples). You can always apply Pareto here: 80% of the code is written by 20% of the total of contributors. And for all of them, it is not hard to fix them (it is harder when you are not used to contributors in the project, but not that hard anyway). You will face bigger problems when mining GNOME git repositories, and you might double/triple count contributions in particular repositories (specially in the Subversion's era). You will find some huge commits with no new code at all (thousands of line of code), or the same history repeated across several repositories with different hashes, and so on. > Our goal is to have a really unified view on the different data sources used > during OSS development (committers, bug trackers, mailers), which is why the > information contained in your LDAP will be very useful to us. Of course, we > do not need *all* the information stored in the LDAP, only the information > that will allow us to link identities to real persons. (Things like > passwords and so on are entirely irrelevant for us, of course.) Peter Rigby has worked in unifying committers and mailing lists, and -as far as I know- he used some techniques proposed by Chris Bird (I do not have the reference at hand, but you will find them). Regards, -- Germán Póo-Caamaño http://www.gnome.org/~gpoo/
Attachment:
signature.asc
Description: This is a digitally signed message part