International Workshop CrowdRE

Transcript of the video

Introduction

Thank you very much, Eddy, for this very nice introduction.

Ladies and gentlemen, I feel very honored to speak to you in this workshop about what is CrowdRE, and what can we do with CrowdRE; what have we done, and what is the potential where it can go.

The definition of CrowdRE

So firstly, I am always a person – Eddy said it – who likes terminology. So, let us briefly look into what is "CrowdRE".

Well, obviously there was a first paper about CrowdRE, by Eddy, Jörg Dörr and Sebastian Adam from 2015. And there you find the definition. It says it is "a semi-automated RE approach for obtaining and analyzing any kind of 'user feedback' from a 'crowd', with the goal of deriving validated user requirements."

Then, two years later, there was an IEEE Software paper by many people, involved also in this CrowdRE workshop, with Eddy again as a first author, and there is another, slightly different definition: "An umbrella term for automated or semiautomated approaches to gather and analyze information from a crowd to derive validated user requirements."

So, what we see from here, it says 'semiautomated or automated'. It says something like, well, 'user feedback'. And it says it is about 'user requirements'.

And obviously, then the next question is: well, who is this 'crowd'? These two papers also give an answer to that. So, the first paper says the crowd should be considered "as a pool of current and potential stakeholders". And the IEEE Software paper is even a bit more narrow; it says it is a pool "of current or potential users of a software product".

So, stakeholders; users; and it is about a software product, what it said here.

And honestly, I think this is… When looking at the field from maybe a little bit of an informed outsider, if you permit, this looks a little bit narrow.

And so, I would propose to go with a slightly broader definition of what CrowdRE is. You will find that in that short paper in the workshop proceedings.

This definition sees CrowdRE as "any approach that engages a crowd of mostly unknown people for performing RE tasks, or for providing RE- / requirements-relevant information".

And why? Why is this broader? Because, it looks generally at systems and not just at products. It goes beyond user feedback, and in that way also beyond evolving systems. It also is applicable in my view into the development of new systems, while user feedback typically is something where we have the idea that there is already something existing – that is, something in the field – and that we can mine or somehow elicit the feedback from users then, for evolving the system.

It goes from the notion of end users or maybe stakeholders to potential stakeholders and actually beyond. We can imagine situations where our crowd are no stakeholders.

It goes beyond elicitation to any kind of RE activities, and I think we can also go beyond automation, and also look into manual approaches that might be worthwhile in CrowdRE.

So, let us base the stuff that I am talking about on this definition.

The roots of CrowdRE

The next thing is typically [to] give credit not only to these initial authors but also to all the people who are at the roots of what we currently call CrowdRE.

And here are a couple of things. And also some selected references about where we find this.

I think, obviously the very notion of crowdsourcing is one of the roots of something that calls itself 'CrowdRE'. Then also traditional elicitation techniques like questionnaires or opinion polls involve a crowd of people for eliciting requirements. This is obviously also one of the roots.

Then everything that is collaborative in RE, like collaborative elicitation, collaborative prioritization; you name it.

All the stuff that is called market-driven RE, with Björn Regnell's papers one of the first ones. Where the idea is: you are not the supplier who supplies to a customer, but where an organization is developing or evolving systems for a market, for a target user community.

It obviously is about end-user involvement, with Norbert Seyff's work among the first ones talking about that in requirements engineering.

And I think also one of, say, our very important instruments nowadays for CrowdRE: the idea of mining RE texts with machine learning techniques, and one of the actually very first papers I found about that is the paper by Jane Cleland-Huang and others in REJ [Requirements Engineering Journal] in 2007, where you will find these keywords at least for the first time. This paper is based on an RE'06 [IEEE International Requirements Engineering Conference 2006] paper. But there the ideas are not yet such elaborated. I think this is one of the actually very first papers that propose this kind of machine learning techniques – classification techniques – for applying that to RE.

So here are some of those roots.

Achievements of CrowdRE

And now, let me take you to some kind of guided tour through what I call "achievements, opportunities, and pitfalls". Let me say two words of caution before I start with this. The first word; obviously, this is my personal opinion what I am sharing here with you. I do not intend to do something like a systematic literature review or giving credit to everything that has been somewhere. So, it is a kind of, well, obviously personally biased, selection of what I deem important, and also what I deem to be somehow representative of what we have.

When I say 'achievement', that not necessarily means that it is, say, proven practice in industrial RE. It means that we have publications about this. There are approaches that we know about, and that I think are applicable regardless of to what extent they are actually applied in industry nowadays.

So, achievements, this is basically summarizing this table from my short paper. And I take the freedom not to have a slide per point here, but to put some things that are related together into a single slide.

And the first one is that about stakeholders. So, the question: "Who are my stakeholders? Where are my stakeholders?" Well, the notion of stakeholders is well-known to everybody. But what needs to be thought about when it comes to CrowdRE is that to some extent we have the classic notion of stakeholders within organizational reach. That means, all the people who are somewhere affiliated with one of the organizations involved in the development or evolution of a system, and which typically can be told or mandated even to contribute, so that we can say: "Well, you please give us feedback; you please participate in the prioritization of those requirements."

However, we also have, particularly when you go towards market-driven RE, we have stakeholders that we call to be outside organizational reach. Which means that these people are typically not known by their names, or even if they were, they have no obligation whatsoever to contribute.

So, this means we have some challenges in finding these kinds of people, who might be important for giving us requirements, and also for motivating them actually to contribute. Let me mention two approaches here. The first one is a classic snowballing approach, applied to finding / identifying stakeholders. This is particularly work by Lim from the 2010s. Or the idea of attracting stakeholders by doing something like a community platform plus certain ideas about social media / access channels, plus motivation of people, not only to see, "Oh, there is a platform", but also, well, "It could be attractive to contribute there". To actually show yourself as a stakeholder. This is work by Martina Kolpondinos and myself very recently. This is REJ this year. Her work is a bit older, but the, say, kind of comprehensive approach about this is in this REJ paper.

Then the classic about, "Crowd, give me your ideas; give me your needs; give me your priorities" with crowd collaboration. Again, with different ways of how we can do this. So, one idea, this is from Norbert Seyff basically, we could exploit stakeholders' existing social networks and platforms. So basically, Norbert's idea was: well, if you have known stakeholders and these stakeholders are, for example, on Facebook, or on LinkedIn, or on you name it; then why not somehow encourage these stakeholders to use this network to find other stakeholders who might then want to contribute? Or providing an RE community platform, for example for doing prioritization. This is one of the early works by Kukreja and Boehm, 2012. Winbook, for example. The idea of doing some WinWin negotiation and prioritization with a kind of community platform.

However, the idea here is: well, we have somehow known stakeholders; we have people who have some intrinsic motivation to contribute here. In general, this is not the case. It is not very clear: do… will people really contribute to this kind of collaborative crowd work? And, well, if you serve an existing community it is fine. However, when we have unknown communities, we need explicit motivational work, and obviously gamification is here one approach that can be taken. One of the early works here is by Snijders, Dalpiaz and others on gamification; on gamified platforms. And also, a more recent one, again by Martina Kolpondinos and I; our REJ paper. It was also, I think, an RE paper in 2017 or so. Collaboration in the crowd.

Then obviously we also have the oldies; questionnaires. Everybody knows this as a technique for crowd elicitation that, at least when you have a list of potential people that you can address by sending email or by approaching people on certain channels, you can send out questionnaires and try to get requirements to get information to get feedback by that. And also, when we go to market things like opinion polls: "what could people obviously like?". With the classic marketing instrument of opinion polls, that can be used for something that we would call CrowdRE.

Then here we have, if you want, the classic core where the notion of CrowdRE a little bit emerged from; the idea of feedback. You see here a rather long list of references, although this is a selection of things. But here I actually came up with most references to give at least somehow credit to the people who contributed. We can somehow distinguish between three approaches. One approach is the classic one of enabling user feedback. That means, we make… take technical means within our systems to make it easy for users to provide feedback, and thus, implicitly motivating them to contribute.

The next thing is polling user feedback. That means, we go actively upon the user community and say, "Give us our feedback!" Well, from mildly encouraging them to over some psychological motivation mechanisms into something that we actually could call coercing users. So, when you have ever booked a hotel room over one of these famous platforms, then what typically happens is: as soon as you have stayed, you get an email: "Please rate your stay." And then probably you get five days later, if you have not done this, you get a reminder: "Why don't you want…?" And then fourteen days later: "Well, it is now the last chance to give a rating of that hotel!" et cetera, to somehow, well, apply social pressure somehow on people to give feedback.

And what we also could do here, and this is also a feedback category, is deducing user feedback by, well, not explicitly go into what users bring back, but for example, monitoring what users are actually doing. If users are somehow voting with their feet, or giving feedback with their feet, this is something we can exploit. By looking into, "How is the actual user behavior?", "Are there certain features that are being used?" Is there a kind of thing [like] "If users use feature A they typically also use feature B"? Can we observe things like: "Users go into working with feature A, do it for 15 seconds, and then do something else"? Which might be an indicator that something is wrong with that feature. So, deriving feedback without having explicit feedback from users.

Using our user community as our guinea pigs, hopefully in a way that they do not notice that they are guinea pigs. Because people typically do not like to be that. The classic obviously is A/B testing, which is a way of getting feedback from a large community.

Again, system usage monitoring somehow goes here. That you put in something into the community, and into an existing product, and see "How is it taken up? How do people work with it?" And then derive information from that.

And also I would say, if you do something like profile mining, how apps or how systems are actually being used by which users, and which kind. Which kind of features are never used, which kind of features are frequently used, et cetera. That gives us a lot of information.

So, Eddy mentioned SUPERSEDE. SUPERSEDE is actually one of those projects that first was trying to systematically look into monitoring, combined with feedback to exploit these things, and do a kind of synergy between explicit feedback and monitoring.

Okay, so this concludes our tours through achievements.

Also, I should mention RE at runtime approaches also go here, to some extent. That we are monitoring how people work with systems and how requirements are satisfied or are being involved. This is also this kind of, well, to some extent, guinea pig approach.

Opportunities of CrowdRE

Okay, opportunities.

Opportunities are things where I think this could be done with CrowdRE techniques, but it has not been tapped or it has not been at least looked into / researched / published in a significant way so far.

So, my first one is something like using classic crowdsourcing for RE tasks. So, this is only CrowdRE if we take this kind of broader view, because this is typically not involving stakeholders, but it can be involving people who are actually non-RE people for doing RE tasks.

Let me make a simple example, which I think is actually a worthwhile idea for trying out. So, if somebody is looking for a research topic for empirical work, here it is: When we generate trace links automatically with tools, and we want to do this on an industrial scale, then what we have to do is vetting all the generated links and throw out all the false positives. This is today typically done by trained and expensive RE people. However, for doing... for vetting this kind of links, you do not need RE training. The only thing is, you need a kind of generally trained... Be a trained person, who is able to look at two documents, being able to read that, and then say, "Well, is this generated candidate link actually a trace link or is not?" So, this would be a candidate for classic micropayment crowdsourcing. Just making some requirements on who can do these tasks, so that you say you need at least some, well, maybe basic academic training or whatever, to be able to do this, and obviously you need English reading skills, but you do not need to be an RE person for doing this.

Another opportunity that I see, and that in my view is currently largely untapped, is what I would call Open Source RE. We all know that Open Source software development is a tremendous success story. And what about doing the same just with requirements? That means, the goal is not that we do Open Source software, but that we just do Open Source requirements. There are some obvious application cases where I think that could work.

Think for example about autonomous driving. This is a topic which is of extremely broad interest for a lot of people. It is a topic where there are a lot of people outside in the world who bring in a lot of experience, from automotive engineers over insurance experts, over politicians, over people thinking about ethics, or whatever, you name it. And bringing together all these people, trying to come up with a kind of general body of requirements for certain categories of autonomous driving. That could have a tremendous effect. Because, well, everybody needs it. Lots of people would have a motivation to contribute. Regulators, for example, would love this, because they could take this kind of body of knowledge, and say, "Well, if a concrete automotive company comes with their system, please show your compliance of your requirements with these requirements." Because this is validated by a lot of experts in the field. It has been calibrated. Missing requirements have been found. Requirements that are somehow strange have been ruled out, et cetera. So, if enough people contribute, we get the same effect as in the successful Open Source projects, that a certain level of quality, and also validation arises. That could be really interesting for requirements in fields where there is a lot of public interest. Maybe also in having the requirements open and not proprietary by individual automotive makers.

Another potential application is: Think about all sorts of personal travel assistants. Where you think, "Well, I am a person, I have a transportation problem from A to B. I do not have my car being out there in the parking lot." And then you have lots of options, from walking, calling GoCars, using public transport, getting an e-bike which is standing somewhere around. Under certain conditions; "Is it raining, am I going to a party and go nicely out dressed, or am I in a way that I would be happy with going with an e-bike?" et cetera. And then getting a kind of recommendation what should I do. This is also a topic of rather broad interest, where I think if we could go with some public-sourced requirements, that people then could work upon for actually doing systems that could be interesting. And it might be more. So, if you come up with more ideas, please let me know. For the next version of this talk.

Pitfalls of CrowdRE

Finally, the pitfalls.

This is a nice picture from the La Brea Tar Pits in ancient times. It is a very interesting source of research for paleontologists today, because these creatures actually got trapped in the tar, were killed and then preserved by the tar, so then we can study their remains today. Well, in CrowdRE we probably would like to avoid being trapped in the tar and then later our failures be analyzed by others, so it might be better to avoid these kind of tar pits.

So the first one is the belief, as Surowiecki had in the title of his famous book, "The Wisdom of Crowds: Why the many are Smarter than the Few", this is frequently the case, but unfortunately not always, so I disagreed with this statement. And here is a famous example. You all know i*, or if you do not, you should. Here is the standard i* symbol set. And i* frequently has been criticized by having these kinds of very abstract symbols that do not have some kind of visual clues about their semantics. And so, people have tried to come up with better symbol sets. And there is a nice paper from RE 2013 by Caire, [Genon,] Heymans, and Moody about experiments in doing better symbol sets for i*. And what they did, they basically did two things: Firstly, they asked a group of experts for designing something that Moody calls "semantically transparent symbols". And secondly, what they did, and this is what you see here: they asked the crowd to come up with symbols and then did a kind of voting among the most popular symbols, and this is what is shown here. And unfortunately, here you see that the crowd is not always wise. So, for example, take this symbol [of a cross]; the symbol for "belief". Unfortunately, this is not semantically transparent, but semantically misleading, because i* beliefs are no religious beliefs. This is something completely different in terms of the semantic notion. Or take this symbol here, the "actor" symbol. It is a bit hard to see here maybe from the back. It is a symbol of an actor standing on stage. This once again is somehow misleading, because an i* actor can be a human, but it could also be a system. It can... could be something automated. And typically, an automated system we would not associate with a person on stage. And secondly, there is a very practical problem with that symbol: Typically, you want symbol sets that you can also draw quickly by hand and when you are drawing the first symbol by hand, with this kind of stage, then you say, well, you throw away this stuff and say, "Well, it is really much more easier to do things like that." So, it is not always optimal what comes out of the crowd.

Pitfall number two: featuritis. So, the question, "How many and which features do our users really need?" And also: "How many can they actually digest?" So, when Jan Bosch gives a talk, then he says: "Well, look, if you have a car, you want to have innovation, and you want to have... every day, discover some new feature in your car." Do you really want this?! When you are driving somewhere and you turn on your radio and want to hear some music and it does not work the way it always works because some new feature... very nice new feature - your new feature of the day - is coming in and distracts you from driving because you have to relearn this stuff; do you really want that? So, the question is, when we do feedback-based elicitation, that we avoid basically implementing tons of features that only few users actually want. And it might be good for those, and it might be actually bad for many other users, who did not give the feedback: "We do not want this feature!" You only heard those who wanted it.

And this is very strongly connected with the next one: the listening to the loudspeakers. The loudest users are not always the most important ones. And we must be very careful about this when we are doing user feedback. Are those who are constantly giving feedback actually those users we target? Is this the group of users that matter to us; they are the really important users? That if we satisfy them, that we will get more market penetration, sell more products, making customers happier, whatever. Or do we just have a very active, very, very small and not really important fraction of our users while all the rest is satisfied and is quiet? And maybe actually then only speak when it is too late, because we did the wrong thing.

Next one, feedback fatigue. The more everybody wants my feedback, the more I might get tired of actually giving it. So, this is the kind of maybe, well, process where too many people constantly wanting feedback eventually leads to a situation where people become tired of it and say: "Well, go away with this stuff. Let me alone!"

Inadequate motivation. Yeah, the basic question when you are doing crowd stuff: why should I contribute and help you solve your problems, which are not my problems? So, crowd members need motivation to contribute, because when there is no motivation, nobody will contribute voluntarily. So, then we only get contributions from people who can only tell to contribute. Or, can exercise some at least social pressure if not real pressure in terms of that they are employees. And even if there is motivation, when the motivation is wrong or when the motivation concept is inadequate, then we might have the effect that some people actually contribute, but it might be the wrong ones. That these... we motivate not the people we would like to have the contribution from, but we motivate those that are not really interesting to us. And that the people that we target do not contribute.

And finally, when a motivation... an explicit motivation concept is broken, then it might be detrimental in terms of that it destroys existing inherent motivation of people. So that those who have some inherent motivation actually get the motivation destroyed. This is particularly important in all sorts of gamification approaches. It does not suffice just to add some points and levels, and then think: "Well, everything comes automatically because it is now gamified." If we do it wrong, if the motivation concept between points and levels et cetera is not good, people might be totally demotivated because it is way too easy to progress, or it is way too hard to progress, for example, in gamified things. Or you think: "Well, this is totally weird: for stuff which I do not like to do and nobody does, you get a lot of points; for other stuff, which is really a lot of work, you get all this nothing. Why should I support this kind of broken thing?" So, it is important to get that right.

And a final pitfall: organizational limits. Particularly when you think about A/B testing. How many A/B tests can you run in parallel without losing configuration control about what you are doing, without unwanted interactions between different A/B tests? If you do not keep this stuff separate, in certain things that you are testing, you might get interference in terms that certain A/B tests somehow biased the results of other A/B tests. And eventually, if you do too many of them, your customers might notice it and get annoyed. There is a nice paper by Katja Kevic and others, who has studied A/B testing in a real-world environment, Microsoft Bing, and found that there are really organizational limits to doing A/B tests on a large scale. You cannot just say: "Well, we are doing everything in A/B testing." It does not work, otherwise you are actually running into these problems.

Okay, so, this somehow concludes my list of potential pitfalls.

Summary & Conclusions

So, let me wrap up; summary and conclusions:

Take a broad view of CrowdRE, in terms of, well, when we think bigger, we also can achieve more with techniques that fall under this umbrella of CrowdRE.

There are a lot of things that we actually... where we actually have results, where we can say, we have achievements. This is known. So, what is left to us is now building on this, actually using this in practice, and further develop it.

There are opportunities outside, and sitting there, that are actually waiting for exploitation. Being on a scientific side, for example trying to do studies or other work, kind of empirical work for figuring out, well this is just a nice idea but unfortunately people do not like it, or how this should be done to make it successfully, or even trying it in a real setting, in a real context.

And finally, when we are doing CrowdRE, let us look where the tar pits are, and take our steps wisely, so that we are not trapped and caught, and are later analyzed by paleontologists.

Thank you.

International Workshop on Crowd-Based Requirements Engineering

Keynote Prof. Dr. Martin Glinz at CrowdRE 2019