Social Media: Growth, Data Generated, and Data Consumption

ODSC - Open Data Science
7 min readJun 13, 2023

Social media is a large source of data that has been subject to a wide variety of research, analytics-driven products, and machine learning solutions. The usage of social media platforms is on an ever rise and consequently the data generated on the platforms. It is important for data practitioners, consumers, and companies that make use of this data to understand where it comes from and how it is consumed by content creators, content consumers, and businesses. This article shares current growth trends and usage on social media platforms, data generated for different platforms, and how the data is consumed today.

Social Media: Growth

Today, social media is widely used across the globe daily to stay connected across geographies and economic borders. There is a rich and diverse set of data generated every second on social media, including posts, messages, images, videos, comments, views, likes, shares, and more. People use a variety of expressions and languages mixed with their personal style of communication. Per [1], the latest figures suggest that there were 3.78 billion social media users worldwide in 2021, which marks a five percent increase from 2020. As seen in the plot below, the number of social media users increased from 2.86 billion in 2017 to 3.78 billion in 2021. This marks a whopping 32.17% increase in four years. The average annual growth in social media consumers has been 230 million between 2017 and 2021.

Global count of social media users by year.

Of the various social media platforms, the market leader Facebook was the first network to surpass one billion registered users. Meta currently owns four of the largest social media platforms including Facebook, WhatsApp, Facebook Messenger, and Instagram. In the third quarter of the year 2021, Facebook reported over 3.58 billion monthly core family product users. Other platforms such as YouTube reported reaching two billion users in October 2020 with a revenue of USD 19.8 Million in 2020 as per [2]. Twitter reported 186 million users in 2020, with a revenue of USD 3.7 billion in 2020, of which USD 3.2 Billion was via advertising and USD 0.5 Million with data licensing and others as per [3]. Instagram reported an annual revenue of USD 24 Billion in 2020, with 1.3 billion annual users in [4].

The United States and China make up the most popular and widely used social media platforms. Many Chinese social networks like WeChat, QQ, and the video sharing app Douyin have garnered mainstream appeal for local content and context. The success of Douyin led to the release of the platform’s international version, what we know today as TikTok.

With statistics from Statista Research Department [5], as of October 2021, the below image breaks down the number of active users by top popular social media networks.

Global active users by social media platform.

Data generated on social media

Let’s look at the nature of data contained within different social media platforms and where this can be found while browsing these social networks.

Some of the popular social media platforms used globally include YouTube, Twitter, Instagram, Facebook, LinkedIn, Reddit, Twitch, and Pinterest. For simplicity, let’s look at a video-centric platform — YouTube, a text-centric platform — Twitter, and an image and video-centric platform — Instagram, and dive into the details of the data generated on each.

YouTube

YouTube is primarily a video platform where brands or individual content creators post videos and individuals watch these videos. People are free to leave comments, likes or dislikes, shares, and subscriptions to content creators. The types of data generated on a platform like YouTube include the following.

  • Statistical data: measured content affinities such as view counts, comment counts, like counts, dislike counts, share counts, and subscription counts.
  • Video data: visual (frames) and audible contents of videos.
  • Text data: video titles, descriptions, categories, tags filled in by content creators, and comments left by content consumers.

Twitter

Twitter is primarily a text-centric platform; however, one can post images and videos as well. Tweets are the main form of content on Twitter, which allows the text of 140 characters or less per tweet. The types of data include the following.

  • Statistical: measured content affinities in terms of follower counts, following counts, tweet retweet counts, and tweet comment counts.
  • Text: tweet text, user descriptions, and tweet comments.
  • Image and video data: tweets containing such media.

Instagram

Instagram is primarily an image and video-centric platform. From reels and stories to image posts, Instagram is widely used across the globe. It is one of the main platforms to find what a celebrity was spotted wearing on a particular day, or where they spent their Sunday afternoon. It has become a platform where a large chunk of users replicate looks and dance videos that are trending to gain followers and interest. The types of data here include the following.

  • Statistical: measured content affinities in terms of follower counts, following counts, post likes, shares, and comment counts.
  • Images and videos: content that was shared and generated as a post, story, or reel.
  • Text: post captions, comments, profile descriptions, and tags.

Instagram also includes shop-able posts that help you purchase a product showcased in an Instagram post.

Consumers of social media data

The primary known way of social media data consumption is the people themselves that watch and consume the content organically. Brands, celebrities, independent content creators, and the general content-consuming population — anyone is free to create and consume content on social media. People create content, many others consume it, and based on what works and what doesn’t, people produce more content, and the cycle continues.

Social media platforms themselves make use of the bulk of data generated on the platforms for the development and improvement of their product, such as showing relevant content for searches, recommendations, and filtering out sensitive content and misinformation.
Another way social media data is used is when social media providers make this data available publicly or contractually to businesses. The businesses are then free to derive insights [6] and use the derivations of raw data to suit their business needs under the platform’s terms and conditions [7].

For most, there is no monetary benefit to consuming or creating content. But if the number of people who consume your content increases above a certain point, several revenue generation opportunities open. These primarily include ads and sponsored content. The majority of social media platforms make the most revenue by running ads. For platforms, the more people using the platforms, the more reach ads get, giving rise to more revenue. 85.49% of Twitter’s annual revenue in 2020 came from advertisements, as per [3]. For an individual content creator on these platforms, the more following you have, the more revenue generation you can expect. With that, the content may not remain organic as now revenue-generating content drives a large amount of decision-making.

Resources

[1] Oberlo. (2021). HOW MANY PEOPLE USE SOCIAL MEDIA IN 2021?HOW MANY PEOPLE USE SOCIAL MEDIA IN 2021? Oberlo. Retrieved December 31, 2021, from https://www.oberlo.com/statistics/how-many-people-use-social-media

[2] Iqbal, M. (2021, December 16). YouTube Revenue and Usage Statistics (2021). Business of Apps. https://www.businessofapps.com/data/youtube-statistics/

[3] Iqbal, M. (2021, November 12). Twitter Revenue and Usage Statistics (2021). Business of Apps. https://www.businessofapps.com/data/twitter-statistics/

[4] Iqbal, M. (2021, November 12). Instagram Revenue and Usage Statistics (2021). Business of Apps. https://www.businessofapps.com/data/instagram-statistics/

[5] Statista Research Department. (2021, November 16). Most popular social networks worldwide as of October 2021, ranked by number of active users. Statista.com. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

[6] Singh, J. (2021). Social Media Analysis using Natural Language Processing Techniques. Proceedings of the 20th Python in Science Conference, pages 52–58. https://conference.scipy.org/proceedings/scipy2021/pdfs/jyotika_singh.pdf

[7] Singh et. al. (2021, March). Method for optimizing media and marketing content using cross-platform video intelligence. Publication number US10949880B2. U.S. Patent and Trademark Office. https://patents.google.com/patent/US10949880B2/en

Cover image by camilo jimenez on Unsplash

About the Author

Jyotika Singh is a researcher, mentor, author, Python programmer, and Data Science practitioner. She currently works as the Director of Data Science at Placemakr where she leads data intelligence and algorithmic development functions for optimizing operations and revenue. Previously, Jyotika was heading the Data Science team at ICX Media (acquired by Salient Global) and developed novel patented solutions in Machine Learning and Artificial Intelligence that led to the business foundation. Jyotika has been working on Natural Language Processing and Social Media data for 8 years. She is a public speaker and has spoken at over 15+ conferences in Python and Data Science. Jyotika has been recognized with several awards in the technology and data space. LinkedIn | Twitter

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.