Open Source and Intellectual Property in the Digital SocietyTechnische Universität BerlinSyeda Mehak ZahraMeryem Naseer Challenge Description Prepare a case study on how a specific FLOSS solution became a de-facto standard: There are many examples of FLOSS solution that attracted dominant industry support to become a de-facto standard – like the Linux kernel, Openstack, or the Apache web server.
How to FLOSS solution reach such a status? Are there attempts to also create formal standards out of them? What are de facto standards, how are they developed in the FLOSS community, and what is their importance?IntroductionThis term paper basically discusses how a particular open source software solution adopted by the community becomes a standard with time. Two standard types are discussed in the paper: defacto standard i.e. the market driven standard that results when a mass of community begins to use the solution and dejure standard which is the standard which results when it is approved through a formal law organization. There are many FLOSS examples in the society who gained such status. In this paper a particular case study on Apache Flink is discussed in detail. And it is shown how it is developed and adopted by the community and the collaborations among them made that particular solution a standard.
Apache’s users include renowned companies, educational institutes and software houses etc. and are increasing with the passage of time because of its utilities. Each aspect is considered in much detail in the later sections of the term paper.
FLOSS- Free Libre Open Source SoftwareIntroduction In open source softwares everyone is freely licensed to use ,copy and redistribute the copies of the code because of the free availability of source code to its users. Thus making its sharing much easier.It is a copyrighted software that is distributed as source code, under a license agreement which grants special rights to users of the software, rights that are normally reserved for the author. Such a license allows all users to make and distribute copies of the software binaries and source code, without special permission from the author. Furthermore, it allows users to modify the source code, and distribute modified copies.What really matters is that open source software is community owned. It is software that is maintained by the community of people (or companies) that use it. It is freely available on the Internet, and anyone may use it.
More importantly, users are encouraged to improve upon it. By sharing our improvements and ideas, pooling our resources with thousands, even millions of others around the world via the Internet, the open source community is able to create powerful, stable, reliable software, at very little cost.But the open source community is much larger than just the people who write the software. Everyone who uses the software participates in a real community and has a voice in its direction. You don’t have to be a programmer. By merely reporting a bug to a program’s author, or writing a simple “how-to” article, you contribute to the community and help to make the software better.
Open-source software is written, documented, distributed and supported by the people who use it. That means that it is sensitive to your needs, not the needs of a corporation trying to sell it to you.Thus open source software is freely licensed to UseCopyChange the software in any way Source code is openly sharedImportance:Standards implemented in open source software can:Reduce risks for lock inImprove interoperabilityPromotes competition on the marketLock-in the close customer loyalty to products / services or a provider , which it the customer because of switching costs and other barriers makes it difficult to change the product or providerInteroperability is the ability to collaborate between different systems, techniques or organizations.
Examples:Following are few FLOSS solutions that acquires a prominent industry support: Linux kernel Open source computer operating system kernelOpen Stack Free open source software platform for cloud computingApache web server Free and open source cross-platform for web server software 3. Case Study – Apache FlinkApache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.Flink is an open-source framework for distributed stream processing that:Provides results that are accurate, even in the case of out-of-order or late-arriving dataIs stateful and fault-tolerant and can seamlessly recover from failures while maintaining exactly-once application statePerforms at large scale, running on thousands of nodes with very good throughput and latency characteristicsEarlier, we discussed aligning the type of dataset (bounded vs. unbounded) with the type of execution model (batch vs. streaming). Many of the Flink features listed below–state management, handling of out-of-order data, flexible windowing–are essential for computing accurate results on unbounded datasets and are enabled by Flink’s streaming execution model.The official Apache Flink project R1, describes Flink as follows: “Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.
” The platform offers software developers varying application-programming interfaces (APIs) to create new applications that are executed on the Flink engine 3.1 History The Origins of Apache Flink and Support Over the Years The origins of Apache Flink can be traced back to June 2008, when Prof. Volker Markl initially founded the Database Systems and Information Management (DIMA) Group at the Technische Universität (TU) Berlin. Soon after his arrival, he laid out the vision for a massively parallel data processing system based on post-relational user-defined functions, combining database and distributed systems concepts, with the goal of enabling modern data analysis and machine learning for big data.
Prof. Markl’s PhD students Stephan Ewen and Fabian Hüske built the very first prototype and shortly thereafter teamed up with Daniel Warneke, a PhD student in Prof. Odej Kao’s Complex and Distributed IT Systems (CIT) Group at TU Berlin. Soon after, Prof. Markl and Prof. Kao sought to collaborate with additional systems researchers in the greater Berlin area, in order to extend, harden, and validate their initial prototype. In 2009, Prof. Markl and Prof.
Kao, jointly with researchers from Humboldt University (HU) of Berlin & the Hasso Plattner Institute (HPI) in Potsdam, co-wrote a DFG (German Research Foundation) research unit proposal entitled “Stratosphere – Information Management on the Cloud R4,” which was funded in 2010. This initial DFG grant (spanning 2010-2012) extended the original vision to develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel cluster architecture. The follow-on DFG proposal entitled, “Stratosphere II: Advanced Analytics for Big Data” was also jointly co-written by researchers at TU Berlin, HU Berlin, and HPI and was funded in 2012. This second DFG grant (spanning 2012-2015) shifted the focus towards the processing of complex data analysis programs with low-latency. These early initiatives coupled with grants from the EU FP7 and Horizon 2020 Programmes, EIT Digital, German Federal Ministries (BMBF and BMWi), and industrial grants from IBM, HP, and Deutsche Telekom, among others provided the financial resources necessary to lay the initial foundation.
Certainly, funding plays a critical role, however, success could only be achieved with the support of numerous collaborators, including members at DFKI (The German Research Centre for Artificial Intelligence), SICS (The Swedish Institute of Computer Science), and SZTAKI (The Hungarian Academy of Sciences), among many others who believed in our vision, contributed, and provided support over the years. In addition, the contributions from numerous PhD and Master’s students, and Postdoctoral Researcher Dr. Kostas Tzoumas paved the way for what is today Apache Flink.
A Stratosphere fork that became an Apache Incubator Project in March 2014 and then went on to become an Apache Top-Level Project in December 2014. In late 2014, Kostas Tzoumas and Stephan Ewen, along with a lot of the original creators of the Apache Flink project founded data Artisans, a company focused on making Flink the next-generation open source platform for programming data-intensive applications. data Artisans started with a seed financing round of 1 million euros from b-to-v Partners in summer 2014, and raised a Series A round of 5.5 million euros led by Intel Capital with participation from b-to-v Partners and Tengelmann Ventures in April 2016. Since the company was founded, many team members (flink.apache.
org/community.html#people) from data Artisans are active contributors to Apache Flink. 3 Collectively, these efforts showcase the path from a research idea to an open source software system that is in use across many companies, software projects, universities, and research institutions worldwide. Apache Flink is today one of the most active open source projects in the Apache Software Foundation with users in academia and industry, as well as contributors and communities all around the world.
3.2 What does Flink provide?Flink is best for:A variety of data sources: When data is generated by millions of different users or devices, it’s safe to assume that some events will arrive out of the order they actually occurred–and in the case of more significant upstream failures, some events might come hours later than they’re supposed to. Late data needs to be handled so that results are accurate.Applications with state: When applications become more complex than simple filtering or enhancing of single data records, managing state within these applications (e.
g., counters, windows of past data, state machines, embedded databases) becomes hard. Flink provides tools so that state is efficient, fault tolerant, and manageable from the outside so you don’t have to build these capabilities yourself.Data that is processed quickly: There is a focus in these use cases on real-time or near-real-time scenarios, where insights from data should be available at nearly the same moment that the data is generated.
Flink is fully capable of meeting these latency requirements when necessary.Data in large volumes: These programs would need to be distributed across many nodes (in some cases, thousands) to support the required scale. Flink can run on large clusters just as seamlessly as it runs on small ones.
4. Apache Flink Success StoriesWe will talk about top 7 use case of Apache Flink deployed in Fortune 500 companies in this use case tutorial. Apache Flink also known as 4G of Big Data, understand its real life applications, here we will discuss real world case studies of Apache Flink.
Apache Flink is deployed in production at leading organizations like Alibaba, Bouygues, Zalando, etc. we will see these game-changing use cases of Apache Flink. Bouygues Telecom- Third largest mobile provider in FranceThe Bouygues Group ranks in Fortune’s “Global 500.” Bouygues uses Flink for real-time event processing and analytics for billions of messages per day in a system that is running 24/7.Bouygues adopte Apache Flink because it supports true streaming at the API and at the runtime level with low latency. It also decrease system startup time, which helped them in extending the business logic in the system.Bouygues wanted to get real-time insights about customer experience, what is happening globally on the network, and what is happening in terms of network evolutions and operations.
Team built a system to analyze network equipment logs to identify indicators of the quality of user experience to fulfil this. The system handles 2 billion events per day (500,000 events per second) with a required end-to-end latency of fewer than 200 milliseconds (including message publication by the transport layer and data processing in Flink). This was achieved on a small cluster reported to be only 10 nodes with 1 gigabyte of memory each.
Planning was to use Flink’s stream processing for transforming and enriching data and pushing back the derived stream data to the message transport system for analytics by multiple consumers.This approach was chosen explicitly. Flink’s stream processing capability allowed the Bouygues team to complete the data processing and movement pipeline while meeting the latency requirement and with high reliability, high availability, and ease of use. The Apache Flink framework, for instance, is ideal for debugging, and it can be switched to local execution. Flink also supports program visualization to help understand how programs are running.
Furthermore, the Flink APIs are attractive to both developers and data scientists.King – The creator of Candy Crush SagaKing – the leading online entertainment company has developed more than 200 games, being played in more than 200 countries and regions.Any stream analytics use case becomes a real technical challenge when more than 300 million monthly users generate more than 30 billion events every day from the different games and systems. To handle these massive data streams using data analytics while keeping maximal flexibility was a great challenge that has been overcome by Apache Flink.Flink allows data scientists at King to get access to these massive data streams in real time.
Even with such a complex game application, Flink is able to provide out of the box solution. Zalando – Leading E-commerce Company in EuropeZalando has more than 16 million customers worldwide and uses Apache Flink for real-time process monitoring. A stream-based architecture nicely supports a micro services approach being used by Zalando, and Flink provides stream processing for business process monitoring and continuous Extract, Transform and Load (ETL)Otto Group – World’s second largest online retailerOtto Group BI Department was planning to develop its own streaming engine for processing their huge data as none of the open source options were fitting its requirements.
After testing Flink, the department found it fit for crowdsourcing user-agent identification and identifying a search session via stream processing. Research Gate – Largest academic social networkResearch Gate is using Flink since 2014 as one of its primary tools in the data infrastructure for both batch and stream processing. It uses Flink for its network analysis and near duplicate detection to enable flawless experience to its members.Alibaba Group – World’s largest retailerAlibaba works with buyers and suppliers through its web portal.
Flink’s variation (called Blink) is being used by the company for online recommendations. Apache Flink provides it the feature to take into consideration the purchases that are being made during the day while recommending products to users. This plays a key role on special days (holidays) when the activity is unusually high.
This is an example where efficient stream processing plays over batch processing.Capital One – Fortune 500 financial services companyBeing a leading consumer and commercial banking institution, the company had the challenge to monitor customer activity data in real time. They wanted this to detect and resolve customer issues immediately and enable flawless digital enterprise experience. Current legacy systems were quite expensive and offered limited capabilities to handle this. Apache Flink provided a real time event processing system that was cost effective and future proof to handle growing customer activity data.
5. De Jure Versus De FactoDe jure standards, or standards according to law, are endorsed by a formal standards organization. The organization ratifies each standard through its official procedures and gives the standard its stamp of approval. De facto standards, or standards in actuality, are adopted widely by an industry and its customers. They are also known as market-driven standards. These standards arise when a critical mass simply likes them well enough to collectively use them. Market-driven standards can become de jure standards if they are approved through a formal standards organization.
Formal standards organizations that create de jure standards have well-documented processes that must be followed. The processes can seem complex or even rigid. But they are necessary to ensure things like repeatability, quality, and safety. The standards organizations themselves may undergo periodic audits. Organizations that develop de jure standards are open for all interested parties to participate. Anyone with a material interest can become a member of a standards committee within these organizations. Consensus is a necessary ingredient. Different organizations have different membership rules and definitions of consensus.
For example, most organizations charge membership fees (always remember that standards development is not free), which vary quite a bit. And some organizations consider consensus to be a simple majority while others require 75% approval for a measure to pass.Because of the processes involved, de jure standards can be slow to produce. Development and approval cycles can take time as each documented step is followed through the process. Achieving consensus, while important and good, can be a lengthy activity. This is especially apparent when not all members of the committee want the standard to succeed. For various reasons—often competitive business—participants in a committee are there to stall or halt the standard. However, once a de jure standard completes the entire process, the implementers and consumers of the standard gain a high level of confidence that it will serve their needs well.
De facto standards are brought about in a variety of ways. They can be closed or open, controlled or uncontrolled, owned by a few or by many, available to everyone or only to approved users. De facto standards can include proprietary and open standards alike.
5.1 Apache Flink as a de-facto standardApache flink is widely used now a days due to its tremendous new features. It has been adapted by many good organizations also. Apache Flink now has a user community of 180 contributors worldwide, more than 10,000 attendees to regular meetups in many cities in Europe, USA, South America, and Asia, at least 13 companies using it in production, many more research projects and academic institutions, as well as a startup that attracted VC funding of more than 6 Mio Euros. Few of the stats of the apache flink are as follows that will depict it as a de-facto standard: The Apache Flink CommunityApache flink has reached . As of May 31, 2016 there are 186 contributors (as reflected in GitHub, github.com/apache/flink), 33 Meetups worldwide (meetup.
com/topics/apache-flink/), over 6300 Apache Flink Meetup members, and almost 4800 Meetup members in big data related groups, where Apache Flink is also a topic of interest. These stats are shown below:Community Growth rate Following are the stats for the community growth rate of apache in the few years which is a high rate success story. First, here’s a summary of community statistics from GitHub. At the time of writing:Contributors have increased from 258 in December 2016 to 352 in December 2017 (up 36%)Stars have increased from 1830 in December 2016 to 3036 in December 2017 (up 65%)Forks have increased from 1255 in December 2016 to 2070 in December 2017 (up 65%) Below mentioned is a figure that demonstrates the meetups used for the apache flink usage:Worldwise Use of Apache Flink Below mentioned are the stats for apache flink usage worldwide which shows USA a biggest user for apache flink and germany secures a second position.Usage in Different Organizations Conclusion :We are afraid of ideas, of experimenting, of change. We shrink from thinking a problem through to a logical conclusion. Anne SullivanAfter having analyzed these obtained results, we can conclude that the open source community is increasing day by day and through above mentioned discussion we can analyze that how particular floss solution ( Apache flink) become a de-facto standard ( custom or convention that has achieved a dominant position by public acceptance or market forces ) due to its major acceptance rate and high increased usage worldwide which basically attempts to create formal acceptance and standards.From the above mentioned discussion we learned that De facto standards (market driven standards) are the standards that are widely accepted by an industry and its customers and these standards come to existence when a mass simply considers them well enough to collectively use them.
From the above information and research we also analyzed that :Market-driven standards (De-facto) can become de jure standards if they are approved through a formal standards organization.If standard is considered a de facto standard it does not mean that it is the best. Most of the time achieve their status because they were the first to arrive on the market scene, or because a dominating organization imposes the standard on others forcing its usage. Often inferior de facto standards remain due to the costs involved when attempting to switch to another standard.
So both standards plays an important de-facto as well as de-jure standards.So we conclude from the report that apache flink has reached high rank of acceptance to make it to the de-facto standard and also has been formally accepted from some standard institutes to use it as a standard. So this particular case study proves that apache flink has reach its status.
ReferencesWebsites:A Historical Account of Apache FlinkTM : Its Origins, Growing Community, and Global Impact by Juan Soto, Technische Universität Berlin May 31, 2016http://www.dima.tu-berlin.de/fileadmin/fg131/Informationsmaterial/Apache_Flink_Origins_for_Public_Release.pdfApache Flink in 2017: Year in Reviewhttps://flink.apache.org/news/2017/12/21/2017-year-in-review.htmlhttps://flink.apache.org/news/2017/12/21/2017-year-in-review.html