Internet companies valuation is based on the number of their daily users.

We're proving how many of them are real people, unique users, or bots.

Play

Internet advertising costs depending on exposure and clicks

We distinguish between exposure to real people, to bots and fake clicks

Play

Someone else is making money from your web content

We make sure your content consumers are real people and not robots, crowlers and scrapers.

Play

Website Owners

Know your REAL customers. Serve real people. Spend your storage, bandwidth and computing power on real users, not fake. Provide your advertisers with REAL value, not just fake clicks and exposure.

Investors

A reliable people verification must be external, independent, totally transparent to you, absolutely discreet towards any other 3rd party, and using tools that does not interfere with the user experience.

Advertisers

Know the real value behind the fantastic numbers. Measure the real users, real traffic, real engagement, real value for your money. How many humans exposed to your ads and how many of them are just bots or the same people with multiple internet identities?

Real users = real value Fake users = fake value

We are all amazed at the massive amount of users on social media. We all develop a technological sense of appreciation based on the amount of views, likes, comments, and shares of topics, people, products, and services.

These numbers have a tremendous subconscious effect on how we relate to a particular subject. An issue that gains millions of fans immediately gets a clear advantage from us over an issue that gets very few fans and very many dislikes and negative comments and reviews.

Internet companies valuation is based on the number of their daily active users

Some users are real, some are bots, and some are the same people with multiple user name.

A Modern Approach to Verifying Humanity Without compremising the users experiance

Verified profile schemes help you catch bad actors on your platform and enable genuine users to elevate their status. This, in turn, builds trust and gives customers more confidence in your services. Our technology runs in the background, without interfering with the user experience. We are completely transparent to human users, and only come into action when we detect a bot, troll, scam, or impersonation, preventing malicious bots from abusing online services.

What is Anti-bot Verification?

Anti-bot verification helps prevent automated spam posts, which are posts created by scripts or programs instead of people.

A Smarter Approach to Bot Management

VerifyPeople.com provides a revolutionary bot management solution that protects websites, mobile apps and APIs from malicious attacks such as scraping, credential stuffing and account takeover.

Behavioral differences between bots and humans

Bots are social media accounts which are controlled by artificial software rather than by humans and serve a variety of purposes from news aggregation to automated customer assistance for online retailers. However, bots have recently been under the spotlight as they are regularly employed as part of large-scale efforts on social media to manipulate public opinion, such as during electoral campaigns.

A new study has revealed the presence of short-term behavioral trends in humans that are absent in social media bots, providing an example of a ‘human signature’ on social media which could be leveraged to develop more sophisticated bot detection strategies. The research is the first study of its kind to apply user behavior over a social media session to the problem of bot detection.

“Remarkably, bots continuously improve to mimic more and more of the behavior humans typi-cally exhibit on social media. Every time we identify a characteristic we think is prerogative of human behavior, such as sentiment of topics of interest, we soon discover that newly-developed open-source bots can now capture those aspects,” says co-author Emilio Ferrara, As-sistant Professor of Computer Science and Research Team Leader at the University of Southern California Information Sciences Institute.

In this work, the researchers studied how the behavior of humans and bots changed over the course of an activity session using a large Twitter dataset associated with recent political events. Over the course of these sessions, the researchers measured various factors to capture user behavior, including the propensity to engage in social interactions and the amount of pro-duced content, and then compared these results between bots and humans.

To study the behavior of bot and human users over an activity session, the researchers focused on indicators of the quantity and quality of social interactions a user engaged in, including the number of retweets, replies and mentions, as well as the length of the tweet itself. They then leveraged these behavioral results to inform a classification system for bot detection to observe whether the inclusion of features describing the session dynamics could improve the performance of the detector. A range of machine learning techniques were used to train two differ-ent sets of classifiers: one including the features describing the session dynamics and one with-out those features, as a baseline.

The researchers found, among humans, trends that were not present among bots: Humans showed an increase in the amount of social interaction over the course of a session, illustrated by an increase in the fraction of retweets, replies and number of mentions contained in a tweet. Humans also showed a decrease in the amount of content produced, illustrated by a de-creasing trend in average tweet length. These trends are thought to be due to the fact that as sessions progress, human users grow tired and are less likely to undertake complex activities, such as composing original content. Another possible explanation may be given by the fact that as time goes by, users are exposed to more posts, therefore increasing their probability to react and interact with content. In both cases, bots were shown to not be affected by such consider-ations and no behavioral change was observed from them.

The researchers used these behavioral results to inform a classification system for bot detection and found that the full model including the features describing session dynamics significantly outperformed the baseline model in its accuracy of bot detection, which did not describe those features.

These results highlight that user behavior on social media evolves in a measurably different manner between bots and humans over an activity session and also suggests that these differ-ences can be used to implement a bot detection system or to improve existing ones.

Emilio highlights: “Bots are constantly evolving – with fast paced advancements in AI, it’s possi-ble to create ever-increasingly realistic bots that can mimic more and more how we talk and interact in online platforms.”

“We are continuously trying to identify dimensions that are particular to the behavior of hu-mans on social media that can in turn be used to develop more sophisticated toolkits to detect bots.”

Measuring Bot and Human Behavioral Dynamics

Bots, social media accounts controlled by software rather than by humans, have recently been under the spotlight for their association with various forms of online manipulation. To date, much work has focused on social bot detection, but little attention has been devoted to the characterization and measurement of the behavior and activity of bots, as opposed to humans'. Over the course of the years, bots have become more sophisticated, and to some extent capable of emulating the short-term behavior of human users. The goal of this paper is to study the behavioral dynamics that bots exhibit over the course of an activity session, and highlight if and how these differ from human activity signatures.

By using a large Twitter dataset associated with recent political events, we first separate bots and humans, then isolate their activity sessions. We compile a list of quantities to be measured, such as the propensity of users to engage in social interactions or to produce content. Our analysis highlights the presence of short-term behavioral trends in humans, which can be associated with a cognitive origin, that are absent in bots, intuitively due to the automated nature of their activity. These findings are finally codified to create and evaluate a machine learning algorithm to detect activity sessions produced by bots and humans, to allow for more nuanced bot detection strategies.

1. Introduction

Social bots are all those social media accounts that are controlled by artificial, as opposed to human, intelligence. Their purposes can be many: news aggregators collect and relay pieces of news from different sources; chatbots can be used as automated customer assistants; however, as a by now large number of studies has shown, the vast majority of bots are employed as part of large-scale efforts to manipulate public opinion or sentiment on social media, such as for viral marketing or electoral campaigns, often with quantifiable effects [1–3].

Scholars' efforts to investigate social bots can roughly be grouped in two categories. On one side, many studies have focused on the theme of bot detection, i.e., on how to identify bot accounts [4–6]. A second line of research deals instead with the impact of bots on society, for example via information spreading and sentiment manipulation [7–9].

The characterization of bot behavior is thus a topic that can yield actionable insights, especially when considered in comparison with the human equivalent. The present work adds to the existing literature in this field by studying the short-term behavioral dynamics, i.e., the temporal evolution of behavioral patterns over the course of an activity session of the two types of accounts. Prior studies have examined the performance of human users when engaging in continuous online interactions, finding measurable changes, for example, in the amount of reactions to other users' post, or in the quality (in terms of grammatical correctness and readability) of the produced content [10, 11].

We hypothesize that such human behavioral changes, if at all present, should be starkly different in the case of bot accounts. To investigate the matter, we analyse two Twitter datasets: a collection of posts from the discussion preceding the 2017 French presidential election—a previous study considered the role played by bot accounts in that context, finding evidence of the presence of a large number of such actors [12]; and a dataset, previously presented in Cresci et al. [13], of hand-labeled tweets from three groups of bots active in as many viral campaigns and one group of human users.

2. Contributions of This Work

Over the course of single activity sessions, we measure different quantities capturing user behavior, e.g., propensity to engage in social interactions, or amount of produced content, and finally contrast results between bots and humans.

The present study advances our understanding of bots and human user behavior in the following ways:

• We reveal the presence of short-term behavioral trends among humans that are instead absent in the case of bots. Such trends may be explained by a deterioration of human user's performance (in terms of quality and quantity of content produced), and by an increasing engagement in social interactions over the course of an online session; in both cases, we would not expect bots to be affected, and indeed we find no significant evidence in that respect.

• In the spirit of the research line on bot detection, we codify our findings in a set of highly predictive features capable of separating human and bot activity sessions, and design and evaluate the performance of a machine learning framework that leverages these features. This can prove extremely desirable when trying to detect so-called cyborgs, users that are in part controlled by humans and in part bots. Our classification system yields an accuracy of up to 97% AUC (Area Under the ROC curve), with the addition of the features identified by our analysis yielding an average improvement over the baseline of up to 14% AUC.

3. Background

3.1. What Is a Bot

A bot (short for robot, a.k.a., social bot, social media bot, social spam bot, or sybil account) is a social media account controlled, predominantly or completely, by a piece of software (a more or less sophisticated artificial intelligence), in contrast with accounts controlled by human users [14]. Next, we describe some techniques to create and detect bots.

3.2. How to Create a Bot

Early social media bots, in the 2000s, were created to tackle simple tasks, such as automatically retweeting content posted by a set of sources, or finding and posting news from the Web [14].

Today, the capabilities of bots have significantly improved: bots rely on the fast-paced advancements of Artificial Intelligence, especially in the area of natural language generation, and use pre-trained multilingual models like OpenAI's GPT-2[15] to generate human-like content. This framework allows the creation of bots that generate genuine-looking short texts on platforms like Twitter, making it harder to distinguish between human and automated accounts [16].

The barriers to bot creation and deployment, as well as the required resources to create large bot networks, have also significantly decreased: for example, it is now possible to rely upon bot-as-a-service (BaaS), to create and distribute large-scale bot networks using pre-existing capabilities provided by companies like ChatBots.io, and run them in cloud infrastructures like Amazon Web Services or Heroku, to make their detection more challenging[17]. For a recent survey of readily-available Twitter bot-making tools (see [2, 12]).

3.3. How to Detect Bots

Historically, bot detection techniques have been pioneered by groups at Indiana University, University of Southern California, and University of Maryland, in the context of a program sponsored by DARPA (the U.S. Defense Advanced Research Projects Agency) aimed at detecting bots used for anti-science misinformation [18]. More recently, large bot networks (botnets) have been discovered on Twitter by various academic groups [19, 20].

The literature on bot detection has become very extensive [13, 14, 21, 22]. In Ferrara et al. [14], we proposed a simple taxonomy to divide bot detection approaches into three classes: (i) systems based on social network information; (ii) systems based on crowd-sourcing and the leveraging of human intelligence; (iii) machine learning methods based on the identification of highly-predictive features that discriminate between bots and humans.

Some openly-accessible tools exist to detect bots on platforms like Twitter: (i) Botometer1 is a bot detection tool developed at Indiana University [6], also used here; (ii) BotSlayer2 is an application for the detection and tracking of potential manipulation of information on Twitter; (iii) the Bot Repository3 is a centralized database to share annotated datasets of Twitter bots. Finally, various models have been proposed to detect bots using sophisticated machine learning techniques, such as deep learning [23], anomaly detection [24–26], and time series analysis [27, 28].

4. Data and Methods

Our first dataset, that we label French Elections (FE), consists of a collection of more than 16M tweets, posted by more than 2 M different users. The tweets were posted between April 25 and May 7, 2017, the 2-weeks period leading to the second round of the French presidential election. A list of 23 keywords and hashtags was manually compiled and used to collect the data through the Twitter Search API4.

To classify the users as bots or humans, we employ the Botometer API5 previously known as BotOrNot [6], which provides a free-to-use, feature-based classification system. When queried about a Twitter user name or user ID, Botometer retrieves from Twitter information about more than a thousand features associated with that account, and returns a corresponding bot score. A bot score is a number representing the likelihood for the account to be controlled by a bot, and it ranges from 0 (definitely human) to 1 (definitely bot).

While, as of January 2020, the latest version of Botometer provides two separate scores, one excluding and one including language-dependent features, such distinction was yet to be implemented at the time of our research. The fact that the FE data contained tweets in different languages was therefore not an issue in this respect. For a more detailed description of the dataset, including its language distribution (see [12]).

Here, we use Botometer to calculate the bot score of more than 380 k accounts in our dataset, namely all that posted at least 5 tweets during the observation time, minus those that were since deleted (27 k), or which privacy setting prevented Botometer to access the necessary information (15 k accounts). The 380 k users are responsible for more than 12 M out of the overall 16 M tweets.

It is worth noting that Botometer does not use any session-related feature, nor does it incorporate any notion of activity sessions [29]: this is important to guarantee that the behavioral differences discussed below are not just an artifact of the classifier relying on session-based features (which would be circular reasoning).

The distribution of the bot scores is reported in Figure 1. To limit the risk of wrongly classifying a human account, we choose to only label as bots those users with a bot score ranking in the top 5% of the distribution, corresponding to a threshold value of 0.53. This is a conservative strategy informed by the fact that a false positive, i.e., labeling a human user as a bot, is generally associated to a higher cost than a false negative, especially when decisions such as account suspensions are informed by this classification. Furthermore, recent analyses demonstrated that, when studying human and bot interactions via Botometer, results do not significantly vary in the threshold range between 0.4 and 0.6 [29]. According to the same conservative strategy, we set the threshold for humans to 0.4, leaving unlabeled all the accounts with a score value between the two thresholds. Summarizing, we have 19 k users labeled as bots and 290 k users labeled as humans, while the reminding 78 k are left unlabeled.

5. Results

5.1. Experimental Analysis

Having organized the tweets in sessions, we proceed to study the temporal dynamics of the two categories of users, bots and humans. Our results are summarized in Figure 4. We focus on four quantities: the fraction of retweets (Figure 4A), and the fraction of replies (Figure 4B), among all tweets posted at a certain position in a session; the number of mentions appearing in a tweet (Figure 4C); and the length of the tweet itself, in characters (Figure 4D). We use two different scales for the FE (left) and HL (right) data, to account for the different nature of the two datasets and at the same time to better highlight the behavioral analogies within each group of users. For every measure we plot the mean with its standard error.

As detailed below, the first three of these four features can provide an indicator of the quantity and quality of the social interactions an user engages in over the course of a session. The text length is instead a measure of the amount of content produced by an user. As correlations between the length of a session and the dynamics of performance indicators have been observed on social networks [10, 11], we restrict our analysis to sessions of similar length; we want our sessions to be long enough to exhibit meaningful trends, yet short enough to occur in significant numbers, as the number of sessions consisting of at least N posts decreases rapidly with N (Figure 3). We thus choose to focus on sessions containing 20–25 posts, resulting in a total of 1,500 bot sessions and 13 k human sessions in the FE dataset, and 1,300 bot sessions and 5,800 human sessions in the HL dataset. In the following paragraphs, we detail our findings for each of the four features.

A retweet is a repost of a tweet previously posted by another user. We expect to see an increase in the number of human retweets during the course of a session, as users get exposed to more content and are thus more likely to engage in social interactions. The fraction of retweets over the total number of tweets, grouped by their position in the session (Equation 1), is shown in Figure 4A: in general, the fraction is higher for humans at all positions; in FE, the fraction increases for humans over all the course of their sessions, starting with a rapid growth in the first 2–3 posts and then slowing down. No equivalent trend appears among bots, that seem instead to oscillate around a constant value.

The reply (Figure 4B), as the name suggests, is a tweet posted in response to some other tweet. The same considerations as for the retweets apply here: we expect to see the fraction of replies increase over the course of a human sessions. Our results confirm our expectation: as for the retweets, the fraction of replies increases and decelerates, for humans, over all the first 20 tweets; the behavior is similar in the two datasets, with a rapid increase over the first 5–6 tweets, after which the value stabilizes around 0.5. Bots, on the other hand, don't show an analogous increase.

On Twitter, users can mention other users in their posts; another possible measure of social interactions is thus the average number of mentions per post. As for the previous cases, we expect the number of mentions to increase, on average, as human users proceed in their session. The results (Figure 4C) do indeed show an increase in the average number of mentions by humans over the course of the first 20 tweets; as in the case of the fraction of replies, a qualitative similarity between the two groups of human users is also apparent. Again, bots don't seem to change their behavior in the course of the session.

The features analyzed so far are all indicators of the amount of social interactions in which users engage. We now consider the average length (in characters) of a tweet, which is a measure of the amount of content produced and is thus an interesting indicator of the short-term behavioral dynamics. Before counting the number of characters, the tweet is stripped off all urls, mentions, and hashtags, so to only account for text effectively composed by the user. A previous study has failed to show any significant variation in this quantity over the course of a short-term session on Twitter [11]; however, analyses of other platforms have shown that the average post length decreases on similar time scales [10]. Here, human data show a clear decreasing trend in FE, whereas no trend emerges for humans nor bots in HL (Figure 4D).

Notice that for the last three quantities (replies, mentions, and text length) we have excluded all retweets from our analysis, as their content is not produced by their poster: whereas the fact of posing a retweet can be considered a behavioral indicator, the content of the retweet itself cannot.

In Table 1, we report statistics (mean, and standard deviation in brackets) for the four features considered above, grouped by users. In both datasets, bots tend to post fewer replies and retweets, and to use fewer mentions. The difference is, however, not large enough to be statistically significant as in all cases, except for the retweets in the HL dataset, it falls within one standard deviation. This evidence further contributes to substantiate the point that the differences observed in the behavioral evolution over the course of a session are not just emerging from features that classifiers such as Botometer would already be taking into account—a point that is particularly relevant with regard to the next section, where we show how the introduction of session features can improve account classification.

In general, our experiments reveal the presence of a temporal evolution in the human behavior over the course of a session on an online social network, whereas, confirming our expectations, no evidence is found of a similar evolution for bot accounts. In the next section, we proceed to further investigate the significance of these temporal trends by incorporating them in a classifier for bot detection.
5.2. Prediction

As the experiments described in the previous section show, user behavior, as captured by the four metrics used above (fraction of retweets, fraction of replies, number of mentions, text length), evolves in a measurably different manner between bots and humans (Figure 4). To further investigate this difference, we implement a classifier that, leveraging the quantities considered above, categorizes tweets as either produced by a bot or a human. Using four different off-the-shelf machine learning algorithms, we train our classifier using 10-fold cross-validation on the HL datasets, which provides a reliable ground truth, as explained above and in Cresci et al. [13].

We proceed to organize the dataset in sessions separate by 60 min intervals as described in section 4. As detailed in Table 2, each tweet is tagged with three session features: (i) session ID (i.e., which session the tweet belongs to), (ii) position of the tweet in the session, (iii) and length of the session (as defined in section 4). Six behavioral features are also considered: (iv) whether the tweet is a retweet, or (v) a reply, (vi) the numbers of mentions, (vii) hashtags, (viii) urls contained in the tweet, and (ix) the text length. We use the nine features to train four classifiers, using as many different techniques: Decision Trees (DT), Extra Trees (ET), Random Forest (RF), and Adaptive Boosting (AB). The purpose of the session ID feature is to allow the classifiers to identify tweets that were posted as part of the same session. To make sure that such identification is possible within but not between the training and testing set, IDs in the latter were encrypted via a hash function.

The training and testing of the model is done via 10-fold cross-validation on the entire dataset. As a measure of the performance of the various classifiers, we use the Area Under the Curve of the Receiver Operating Characteristic (shortened as AUC and ROC, respectively). The ROC curve of a binary classifier plots its True Positive Rate against the corresponding False Positive Rate for different Sensitivity values, i.e., ranging from no positives to all positives (Equations 2, 3).
TPR=True PositivePositive, FPR=False PositiveNegative. (2)

The AUC is usually expressed as a percentage of the maximum attainable value, which would correspond to an ideal classifiers (one that has True Positive Rate always equal to one); the higher the AUC, the better the classifier's performance. Notice that a perfectly random binary classifier would have an AUC of 50%.

The ROC curves are shown in Figure 5A: all the classifiers report an AUC of 97%, except for the AB, that scores 84%. Aside from the details of the effectiveness of each classifier, the results just described go to show that short-term behavioral patterns can effectively be used to inform bot detection.

To precisely quantify the impact of the introduction of the session dynamics features, we train four more classifiers, equivalent in all respects to the ones described above except for the set of features used for the training: here only the behavioral features (retweet, reply, hashtags, mentions, urls, text length) are included while the three session features (session ID, position in session, session length) are left out. The four models (again DT, ET, RF, and AB) are trained and tested via 10-fold cross-validation, and the corresponding ROC curves are shown in Figure 5B. The new four models serve as a baseline to compare the full models to; the difference is particularly pronounced for the first three models (DT, ET, RF), for which the AUC yields a 83% for the baseline versions, 14 points lower than their counterparts trained with all the nine features. The AB model also performs worse without the session features (AUC 80%, compared to the 84% obtained with the full features).

All the testing of our classifiers was done, until this point, on the HL dataset. We would now be interested in carrying out some sort of testing on the dataset of the French election tweets introduced in section 4. As such dataset lacks annotations, a proper test can not be performed, but we can still exploit the Botometer scores to get some information about the performance of our classifiers, and again draw a comparison with the baseline case where session features are omitted. To this purpose, we let the bot threshold (Botometer score value above which an account is consider a bot) vary on all the range of possible values (0–1), and for each case compare the results given by the classifiers, trained on the HL dataset as described above, with these “annotations.” Let us remark that our purpose here is to evaluate the effectiveness of the introduction of the session features, and not to exactly evaluate the sensitivity of the classifiers.

The test is performed using the two AB classifiers (the full model and the baseline), and the results are shown in Figure 6. The left part of the graph is not actually very informative, as when the bot threshold is set below 0.4 the “positive” accounts will actually include many humans. It is roughly in correspondence of the 0.4 value that the True Positive Rate (Equation 2) of the classifier starts increasing, and although the baseline classifier's TPR increases as well, the former outperforms the latter at all points.

Summarizing, these results suggest that features describing the short-term behavioral dynamics of the users can effectively be employed to implement a bot detection system or to improve existing ones, further confirming that a difference exists in such dynamics between humans and bots.
6. Discussion

The results detailed in the previous two sections provide evidence of the existence of significant differences in the temporal evolution of behavior over the course of an online session between human and bot users.

In particular, in section 5.1 we analyse four different indicators of the users' behavior and find, among humans, trends that are not present among bots: first of all, an increase in the fraction of retweets and replies, and in the number of mentions contained in a tweet, quantities that can all together be seen as a measure of the amount of social interaction an user is taking part in; secondly, a decrease in the amount of content produced, measured as the average tweet length. Such trends are present up to the 20th post in human sessions, whereas the same indicators remain roughly constant for bots. This may be partly due to the fact that, as a sessions progresses, users grow more tired and become less likely to undertake more complex activities, such as composing an original post [11]. At the same time, we hypothesize that another possible (and possibly concurring) explanation may be given by the fact that, as time goes by, users are exposed to more and more posts, thus increasing their probability to react, for example by retweeting or by mentioning the author of a previous post. In both cases, bots would not be affected by such considerations, and no behavioral change should be expected from them.

In section 5.2, we use the results obtained in section 5.1 to inform a classification system for bot detection. Our purpose there is to highlight how the introduction of features describing the session dynamics (session ID, position of the tweet in the session, and length of the session) can substantially improve the performance of the detector. To this purpose, we use a range of different machine learning techniques (Decision Trees, Extra Trees, Random Forests, Adaptive Boosting), to train, through 10-fold cross-validation, two different sets of classifiers: one including the features describing the session dynamics (the full model), and one without those features (the baseline). The comparison between the two sets of models, carried out both on the annotated dataset used for the cross-validation and on the dataset of tweets concerning the French elections, where Botometer is instead employed, show that the full model significantly outperforms the baseline.

It is worth noting again that Botometer, while considering temporal features, does not implement any notion of activity sessions nor does it use any session-based features for bot classification [29]. This ensures that the behavioral differences highlighted in this work are genuine and not simply an artifact due to discriminating on features used for classification purposes (that would be circular reasoning); the comparison detailed in section 5.2, where classifiers trained with session features are shown to perform better than their session blind counterparts, corroborates such a claim.
6.1. Related Work

Bots in some occasions have been used for social good, e.g., to deliver positive interventions [32, 33]. Yet, their use is mostly associated with malicious operations. For example, bots have been involved in manipulation of political conversation [1–3, 7, 34], the spread of disinformation and fake news [8, 12, 21], conspiracy [18], extremist propaganda [35, 36], as well as stock market manipulation [37]. Concerns for public health also recently emerged [38–41]. This increasing evidence brought our research community to propose a wealth of techniques to address the challenges posed by the pervasive presence of bots in platforms like Facebook and Twitter. Social bot detection is one such example. Our work differentiates from this literature as it is not directly aimed at bot detection, yet our findings can be used to inform detection based on bot and human features and behaviors.

The study of bots' characteristic is another recent research thread that attracted much attention. Researchers discovered that bots exhibit a variety of diverse behaviors, capabilities, and intents [29, 42]. A recent technical memo illustrated novel directions in bot design that leverage Artificial Intelligence (AI): AI bots can generate media and textual content of quality potentially similar to human-generated content but at much larger scale, completely automatically [43]. In this work, we highlighted similarities and dissimilarities between bots' and humans' behavioral characteristics, illustrating the current state of bots' capabilities.

The ability of bots to operate in concert (botnets) attracted the attention of the cybersecurity research community. Examples of such botnets have been revealed on Twitter [20, 44]. Botnet detection is still in its early stage, however much work assumed unrestricted access to social media platform infrastructure. Different social media providers, for example, applied bot detection techniques in the back-end of other platforms, like Facebook [45, 46] and Renren (a Chinese Twitter-like social platform) [47, 48]. Although these approaches can be valuable and show promising results [45, 49, 50], for example to detect large-scale bot infiltration, they can be implemented exclusively by social media service providers with full access to data and system infrastructure.

Researchers in academic groups, who don't have unrestricted access to social media data and systems, proposed many alternative techniques that can work well with smaller samples of user activity, and fewer labeled examples of bots and humans. The research presented here is one such example. Other examples include the classification system proposed by Chu et al. [4, 51], the crowd-sourcing detection framework by Wang et al. [52], the NLP-based detection methods by Clark et al. [5], the BotOrNot classifier [6], a Twitter campaign detection system [53, 54], and deep neural detection models [23].

Some historical user activity data is still needed for these methods to function properly, either by indirect data collection [4, 5, 51, 52, 55], or, like in the case of BotOrNot [6], by interrogating the Twitter API (which imposes strict rate limits, making it of little use for large-scale bot detection). Given these limits, we believe that it is very valuable to have a deep understanding of human and bot behavioral performance dynamics: our findings can inform data collection and annotation strategies, can help improve classification accuracy by injecting expert knowledge and produce better, more informative and predictive features, and ultimately allow for a better understanding of interaction mechanisms online.
7. Conclusion

In the present work we have investigated the behavioral dynamics of social network users over the course of an online session, with particular attention to the differences emerging between human and bot accounts under this perspective. User session dynamics have been investigated in the literature before but, to the best of our knowledge, never applied to the problem of bot detection.

Our analysis revealed the presence of behavioral trends at the session level among humans that are not observed in bot accounts. We hypothesized two possible mechanisms motivating such trends: on one side, humans' performance deteriorates as they engage in prolonged online sessions; this decline has been attributed to a cognitive origin in related work. On the other hand, over the course of their online activity, humans are constantly exposed to posts and messages by other users, so their probability to engage in social interaction increases. Devising methods to further test each of these two hypotheses could possibly constitute an avenue for future research. Furthermore, the presence of such behavioral differences between the two categories of users can be leveraged to improve bot detection techniques. To investigate this possibility, we trained two categories of classifiers, one including and one excluding features describing the session dynamics. The comparison shows that session features bring an increase of up to 14% AUC, substantially improving the performance of bot detectors. This suggests that features inspired by cognitive dynamics can be useful indicators of human activity signatures. Importantly, the classifier adopted as a baseline does not leverage any session-related features, thus ensuring that the results we observe are genuine and not the artifact of circular reasoning. It may be an interesting object of future work to better characterize the interplay between the features studied here and other features leveraged by various bot detection techniques, such as the ones mentioned in section 6.1. Overall, our study contributes both to the ongoing investigation around the detection and characterization of social bots, and to the understanding of online human behavior, specifically the short-term dynamical evolution over the course of activity sessions.

Source: https://www.frontiersin.org/articles/10.3389/fphy.2020.00125/full