Big Data

Big Data – Mastering the Deluge of Data

The discussion about Big Data is about much more than the clever and profitable analysis of Internet data. In the age of the fourth Industrial Revolution and the emergence of cyber-physical systems and ultimately highly integrated Smart Ecosystems, the issue is to generate actually tangible added value for companies and individuals from the potential availability of a seemingly endless stream of data. When it comes to Big Data in a Smart Ecosystem, data from classical embedded software systems and business information systems as well as from humans as the third central source of data play a key role. In this context, it is important, on the one hand, not to put the functional safety of the involved embedded systems into jeopardy, and to assure the security of the data within a Smart Ecosystems in the long term on the other hand. This also forms the prerequisite for getting humans accepted as central figures in a Smart Ecosystem. Acceptance here encompasses the willingness to enter data into such a system on the presumption that these will be secure, and at the same time getting an actual added value from the usage of these data. Of course, Big Data raises many questions regarding how to deal with the users of Internet services. On the other hand, there is no doubt that the analysis of data on the Internet allows making better offers that are more appropriate for the respective target customers. However, the Internet in the stricter sense is just one of many areas in which Big Data will play an ever more important role. To the same extent that digitalization increasingly permeates all areas of our lives, more data are generated – so Big Data also has a cross-cutting role.

Big Data – the Biggest Challenges

It is imperative to master some central aspects of Big Data in Smart Ecosystems if such systems are to become reality. One of the challenges in practice will be, for example, to find suitable infrastructures: In Smart Ecosystems, companies of various sizes are collaborating in an ecosystem. For the analysis of the data, a Big Data infrastructure is provided in these companies with appropriate computing power. This entails several challenges for such companies. Particularly small and medium-sized companies might not be willing or able to purchase a dedicated infrastructure. Here, new solution strategies must be found: On the one hand, the trend towards storing data in the Cloud offers first approaches for Big Data analyses. For the study of sensitive data, temporary infrastructure lease approaches are also conceivable. Both approaches can be thought of as Big Data analyses “on demand”. Another problematic issue continues to be the progressive establishment of different, partly incompatible technologies for use in Big Data analyses: Currently a heterogeneous landscape of Big Data providers is developing, which manifests itself differently in the Smart Ecosystems of different companies. For cross-company analyses, however, de-facto standards in an ecosystem, compatible interfaces between provider systems, and a suitable, highly performant intermediate layer for data exchange must be established.

Establishment of Innovative Business Models: It is a generally accepted fact that innovative business models are a central issue in Smart Ecosystems. In order to realize these, new, partly not yet existent service providers should be founded. One example would be the provision of the above-mentioned so-called “on-demand” analyses. However, for potential stakeholders in such an innovative business model in a Smart Ecosystem, questions frequently arise regarding the issue of risk management, respectively the feasibility of such a business model. If research develops simulation environments for such Smart Ecosystems in the sense of rapid-prototyping environments, the question about added value can be answered better. Once processes and technologies are more mature and can guarantee data protection for the analysis and its data, the inhibiting barriers will come down.

Standardization: To achieve the goal of an efficient exchange of data among companies and between the analysis results obtained with different technologies, standardization of the data, of their modeling processes, and of the specification of data qualities would be a very helpful. Many standardization processes nowadays take place in specific domains, e.g., in mechanical engineering, in the automotive industry, or in the financial sector. It is characteristic of the Smart Ecosystems of tomorrow, however, that stakeholders from a diversity of domains are involved in an ecosystem, with the number of stakeholders in the ecosystem varying widely. This can make it very hard to achieve standardization quickly and thus constitutes one of the greatest challenges.

The numerous conceivable application scenarios in a Smart Ecosystem cannot be regarded independent of each other, but should rather be considered an excerpt of a continually evolving system. In this system, new services and organizations are added and replace others over time. This intertwining can already be observed today in the area of energy and electromobility, where electric vehicles consume energy on the one hand, but can also be used for decentralized energy storage on the other hand. Another example is the interconnection between production technology and intelligent mobility systems. Here, the goal is to reduce the transport and waiting times of goods and to be able to react flexibly to the re-planning of production processes.

Big Data – Our Research Topics

Decision Models: Nowadays, a multitude of data are collected across a huge variety of systems, but it is often not clear how these data can be used in a meaningful way in the context of strategic orientation and considering data security. Fraunhofer IESE is working on how decision models must be constructed systematically based on business goals and how data must be aggregated, resp. condensed, accordingly in order to permit efficient decision-making.

Data and Information Quality: Another basic problem with Big Data analyses is trust in the data and in the information derived from these. The data may stem from widely different systems of various organizations, and details about the collection method used and the quality assurance performed are not always known. The explicit modeling of the data, incl. the requested quality features such as completeness, consistency, or up-to-dateness of the data, is based on approaches from the area of software quality modeling, which is one of the areas of expertise of Fraunhofer IESE.

Data Visualization: Not all decisions in a Smart Ecosystem are made on the basis of a control system that can be automated; rather, many decisions are made by humans. In order to support decision makers, visualization mechanisms are thus indispensable. Here, a rough distinction is made between approaches for the condensation and user-appropriate exploration of Big Data and approaches for the visualization of large amounts of data with efficient algorithms. At Fraunhofer IESE, one of the areas of work concerns the question of which visualization mechanisms are suitable for Big Data and how users must interact with these in order to be able to make decisions efficiently.

Acceptance: An overarching issue regarding the usage of Big Data in Smart Ecosystems is the acceptance of such systems by humans as their central users. On the one hand, the security of the provided data must be guaranteed, while humans should be integrated as a data source as reliably and transparently as possible; on the other hand, humans should not be overwhelmed with the diversity of information, but should rather be provided with the amount of information that is appropriate for making the respective decision.

Data Usage Control: Central issues for companies in the context of Smart Ecosystems are data ownership and data protection. Even though sensitive data such as production and quality data might offer great potential for scenarios such as cross-company production data analysis in the context of automobile manufacturing, these data are not made freely available by the stakeholders. Research in the area of data protection for Big Data in Smart Ecosystems is therefore particularly important, as it eliminates a core obstacle for cross-company analyses. Research being performed at Fraunhofer IESE in the area of Usage Control already allows effective protection of data leaving a company’s own premises, through the use of correspondingly formulated policies and a modified infrastructure environment (enforcement frameworks). These frameworks are being optimized successively for the needs of Big Data technologies. Here it must be investigated whether the required performance in the analysis of the data can be achieved if existing usage control technology is combined with Big Data technologies.