Do not hesitate

Let the rainbow shining!

LGBTQ data story

It matters to you and me

Do you know how many countries are still criminalizing consensual same-sex activity? Can you imagine being maliciously insulted just for pursuing love? Those are the things LGBTQ people are going through: for instance, only 24% of countries allowed for Same-sex marriage, nearly 95% of countries do not have constitutional protections toward LGBTQ, only 22.1% of countries allowed for Joint Adoption for LGBTQ couples. Therefore, it is meaningful to analyse the reasons behind those phenomenons.

Today we will utilize several datasets to take you deeper into this community, explore the causes of their injustice situation, and study possible ways to create a better world for this minority group.

Carousel Bootstrap Second


What is inside the quotations



Let's take a look of the Sentiment

The words of important people in society have a great influence on the public and represent to some extent the current ideology in society. Unexpected information can be acquired through studying those words. In this and the next chapters, we will use the Quotebank dataset to study the words of famous speakers from 2015 and 2020. First, let’s have an overview of how people think of this minority group over the years. Have all the efforts and social campaigns made a difference?

To explore the pattern of speakers’ attitudes toward the LGBTQ group, we carry out sentiment analysis on those quotations, and the results are shown in the following pie charts.

From those plots, we can see that the portion of negative attitudes is almost always higher than the portion of positive attitudes. The small “triumph” of positive over negative in 2020 can be a result of the lack of data. However, things are getting better. The negative proportion is decreasing while the positive proportion is increasing from 2018 to 2020.


What about the speakers?

Now let’s look more in detail into the speakers. How do speakers from different groups think about LGBTQ people? We now divide the speakers into different groups according to some specific features (age, nationality, gender, ethnic group, occupation, and religion), and study the change of their attention on this topic over the years.

From the plots, we can see several interesting results. When studying speakers’ age, we divide the people into three groups. We find that the middle-aged group cares the most about the LGBTQ topic, and their proportion has been increasing over the years. This can be explained by the fact that most of the famous speakers who are authorities or representatives of their areas are in their midlife, it is their responsibility to pay more attention to this sensitive topic. Besides, let’s focus on the ethnic group, and we find that African Americans pay the most attention to this topic, overpassing other groups a lot. This may result from the African American group’s unjust treatment in the US, which makes them more concerned about social equality.

we now move on to see the change of distribution of people’s attitudes toward that topic over the years. We plot the following graph for analysis, the x-axis means the feature we choose (date of birth, gender, nationality, and so on), each bar represents a kind of feature value in each group. For features with more than 10 values, we just choose the top 10 kinds. The y axis stands for the proportion of each emotion of this group.

As is shown in the emotion distribution in different ages, we can see that most of the positive attitudes holders are young people.

In the emotion distribution among different nationalities, we can find some interesting facts: people from Israel usually have negative attitudes towards the LGBTQ community, while Ireland people tend to think in a positive way of the group. And then we focus on the top 3 countries that have the most attention, i.e. US, UK, Canada. We can find that, in the US, negative emotion is usually overpassing positive emotion. In the UK, the proportion of positive emotion has been increasing. In Canada, the negative emotion and positive emotion proportions are close.

Some interesting facts pop out when studying the speakers’ occupation. Most politicians, writers, lawyers, and journalists have negative emotions towards the LGBTQ community, while the people whose jobs are related to TV or film, like actors, singers, film producers, and so on, usually have positive attitudes toward LGBTQ.

Finally, let’s focus on the speakers’ religion, especially on the Judaism and Catholic Church. We can see that the negative emotion ratio is so large in these two groups. In 2017, their negative emotion ratio is 70%, which is more than the average negative emotion ratio (46.2%).

What are the topics around LGBTQ?

After the sentiment analysis, we can observe some patterns from different individual groups. However, this is not enough, as we can not be sure about what resulted in those emotions. In other words, we want to dig out the problems and events that are hidden behind those emotions, to see how they are related to LGBTQ, and from these problems, we are able to start the journey of searching for possible solutions to build a better LGBTQ-friendly community. Therefore, here we aim at a bigger scope to analyze the topics that co-occur with LGBTQ. In this part we want to be more precise and convincing about the topics that we extracted. Further, we want to use topic aggregations to have deeper understanding of the possible reasons. Here we directly apply the integrated API as BERT-Topic, whose method scheme is as follows:

  1. Gather LGBTQ quotations as one document (lists of quotations)
  2. Perform text embedding using Transformer language models (with pre-trained weight loaded)
  3. Apply aggregation algorithm to text embeddings using UMAP (for dimension reduction) and HDBSAN (for clustering) algorithm
  4. extract topics using class-based TF-IDF algorithm

After extracting the topics, we sort them by the frequency each topic appeared in the quotations. Then we chose the top-17 most related ones, each of which is represented by a set of words following weighted distributions. Then we assign each of the clusters a meaningful name. Here we give three examples of word-represented topics:


How people pay attention to these topics?

Firstly, we want to see what portion each topic shares, through which we can see in details what topics people care about the most.

From the above plot, we can see that the “Politics” topic is talked about the most frequently, as may be the most efficient and direct way to solve the LGBTQ people’s problems is through legislation and promulgation of policies to protect the rights of the LGBTQ community. We can also observe several LGBTQ-related problems from these topics, such as “Foster”, “Health”, “Homeless”, “HIV”, and “Suicide”, which we will discuss more later. Before we go deeper into the above topics, let’s take a look at the speakers behind these topics. Here we chose the top 20 people with frequent LGBTQ-related topic quotations, and compute the portions of quotations they hold:


Who is behind these topics?

Before we go deeper into the above topics, let’s take a look at the speakers behind these topics. Here we chose the top 20 people with frequent LGBTQ-related topic quotations, and compute the portions of quotations they hold:

Most of these names may not be familiar to you (except two famous persons in Quotebank – Donald Trump and Hillary Clinton),


Three LGBTQ activists in QuoteBank


Sarah Kate Ellis

Ellis was appointed president and CEO of GLAAD, the nation’s largest lesbian, gay, bisexual, and transgender (LGBT) media advocacy organization.

“This loss is a wake-up call that despite remarkable progress for LGBT equality, we must never become complacent in the face of injustice.”


Chad Griffin

Griffin served as president of the Human Rights Campaign, the largest LGBT rights organization in the United States

“We will also continue our efforts to advance critically-needed protections at the state, local, and ultimately the federal level for LGBT people all across this country.”


Annise Parker

Annise Parker was the first openly LGBTQ mayor of a major American city.

“What a joyous, historic day for America, the LGBT community, individual families, the institution of marriage and the fight for equality. I used to think that I would not see this day in my lifetime.”


How famous people care about famous topics?

We also want to know how does each of these famous LGBTQ topic-related speakers pay attention to the top LGBTQ-related topics, and here we depict the top-10 persons with portions for each topic:

It is interesting that these famous speakers’ attention does not follow the overall topic attention proportion. For example, “Foster” problems are more frequently discussed than “Politics” problems, where the latter is the top-1 topic in our previous analysis. And the overall less heated topic “Fight” (which is related to “Fighting for LGBTQ Rights”) is much more frequently talked about by these famous people. Besides, some people show preference toward certain topics, such as Dwyane Wade has more quotations relevant to problems such as “Homeless” and “Foster”, while hardly talking about “Politics”-related LGBTQ topics.


How these topics change over time?

After seeing “what” are the LGBTQ-related topics and by “who” these topics are discussed, now we want to see “when” these topics are talked about, and how the attention evolves over time:

As we can see, the order of topics’ frequency over the last five years mostly follows the distribution of their overall portions in the quotations. We also observe an interesting pattern that the frequency of all these topics increases from March to May, and there is a surge of political topics near the end of 2018.


What problems is behind the topics?

Now we go back to the initial problem – what are the problems and events that are behind these topics? And how much attention they have drawn? What problems have been relatively neglected? To directly inspect these questions, we build a topic network for each topic cluster and give each node a weight as:

        W = Topic_Freq * Word_Prob

where Topic_Freq is the overall topic frequency over the whole QuoteBank dataset from the year 2015 to 2020, and the Word_Prob refers to the word distribution probability among the topic.

And we build the connection between nodes following the word embedding clusters given by the HDBCAN clustering algorithm from BertTopic Model. Finally, we obtain the network like this:

Here each node in this network could represent one or more LGBTQ-related quotations (you can hover on them to see examples). And we could see that some problems such as “Mental Health” could contain some worth-noted sub-topics such as “therapy” and “abuse”. And some topics are actually related to others, such as the “suicide” problem is actually related to “mental health” and “teens” (you can see the orange “suicide” node at lower right connected with “Mental Health” node, and the pink teens node in the lower left contected with the “Suicide” node).

So, after witnessing all these issues, we want to know why these happens in our human society, further, we want to discuss about possible solutions.

Unfortunately, there’s only so much that our old friend QuoteBank can do for us, and in the belowing story, we would like to introducing two new dataset:

  • WorldBank:

    WorldBank data contains various of statistical data from different countries and regions over the world, such as economy, education, employment, environment and population. This dataset maintains a number of macro, financial and sector databases where much of the them comes from the statistical systems of member countries.

    We mainly use worldbank data for analyzing the factors that lead to different acceptance levels towards LGBTQ community of different countries and regions.

  • Sexual Orientation Laws in the World:

    This dataset maintains different LGBTQ-related national policies of different countries, such as same-sex marriage, conversion therapy, penalty, adoption & fostering and employment protection of LGBTQ people. The data mainly from ILGA World along with the State-Sponsored Homophobia report.

    We use this dataset for evaluating LGBTQ acceptance levels of different countries. By quantifying the policies into scores, we combine this with the WorldBank dataset for regression analysis of different factors.

with the help of these two datasets, we could follow the clues we find so far from QuoteBank to dig deeper and finally find the truth behind all these quotations.

Situation around the world

From the analysis during last section, we see that some topics are frequently discussed, and the most frequent topic cluster is about politics, as we could guess that the most efficient way to solve the inequality and discrimination problem toward LGBTQ group is enacting policies and legislation. And a LGBTQ-friendly policy system is perhaps the very first step for dealing with other LGBTQ-related social issues, such as:

  • Homeless: LGBTQ Employement Protection
  • Foster: Adoption open to same-sex couples
  • Health: Social welfare policy
  • Family recognition: Marriage or other foms of legal union for LGBTQ persons.

Therefore, let’s take a look at the policy and laws in different countries toward LGBTQ, analyze factors that may affect such policies and finally try to find some workable solutions to build a better LGBTQ-friendly society.


Policies of different concerns

First of all, we want to give a overview of the 9 LGBTQ-related policies with respect to countries and regions, to roughly evaluate the acceptance level our world toward LGBTQ communities.

# Policy Explaination
1 CSSSA Legal Criminalisation of Consensual Same-Sex Sexual Acts between Adults
2 Penalty Penalty toward LGBTQ people prescribed by law
3 Constitutional Protection Whether constitutional rights protected by law
4 Broad Protection Whether have law-based right protection
5 Employment Protection Whether protect the employment right
6 Ban Conversion Therapies Whether sex conversion therapies is banned
7 Allow Same-Sex Marriage Whether LGBTQ legal partner recognized by law
8 Allow Joint Adoption Whether allow joint adoption by a same-sex couple
9 Allow Second Parent Adoption Whether allow adoption adoption by one partner of the other's biological child

It’s disappointing to see that the situation here is not optimistic. For example, nealy 95% countries do not have constitutional protections toward LGBTQ, only 24% countries allowed for Same-Sex Marriage, and only 22.1% countries allowed for Joint Adoption for LGBTQ couples. This could cause some very ridiculous and awful situation. For example, in some same-sex marriage illegal countries, LGBTQ couples are not recognized by law, not only they can not adopt child for fostering, a even more severe concequence is, If one partner dies, the property cannot be inherited by the other partner, but instead forced to be inherited by other relatives by blood.

One thing that seems “optimistic” from above is that only 6% countries have banned the conversion therapies. However, this is not really the case, even those countries do not explicitly ban the therapy, the conversion is still illegal in some countries, and some transgenders could lose their legally recognized identity information after conversion, which could restrict all aspects of their social life. Moreover, if the countries do not standardise conversion therapy, some private clinics could carry out high-risk surgery for profit.


Quantifying the acceptance level

To better visualize the policies of different counrties, we decide to quantitize different values of each policies. Specifically, we divide policies into two categories as benefit policies and penalty policies. For example, for the “Employment Protection” policy, we give a score 2 for countries with such policy, 1 for those with limited (incomplete) policy and 0 for those without. And we assign different negative scores for different penalty policies. Overall we control the scores in every country falling in the range of [-16, 16], and finally depict in the world map:

It seems like most LGBTQ-friendly countries are located at the Western Europe, North/South America, South Africa and Oceania. And most Asia show neutral or slightly positive acceptance scores, while most of LGBTQ-unfriendly countries are located in Mid-east, North and East Africa.


What might be the factors?

We are wondering what factor could cause the differences in these countries, which may further guide us for coming up with possible solutions.


Economical Situation

At first glance of the LGBTQ acceptance score map, most developed countries are LGBTQ-friendly, from such fact we want to know whether a country’s economical situation is positively related to its LGBTQ acceptance score.

Here we conduct the regression analysis for different regions in the world, and choose the independent variable to be GDP per-capita instead of overall GDP score, as the latter could be largely affected by the country size and populations.

Just like what we have expected, regression coefficients in most countries are positive, while there are two exeptions: North America and Latin America & Caribbean. The coefficient in North America is trivial as there only two contries with already very positive LGBTQ scores, while only varies in one policy. And the situation in Latin America & Caribbean is a bit more complex, as from the world map these area are mostly LGBTQ-friendly, but there are several tiny regions and island countries, which we believe could involve other main factors such as culture and isolation.


Education Level

Secondly, we explore the relationship between the education level and acceptance of LGBTQ, we conduct the regression analysis between the education ratio for population over 25 years old with respect to two education levels: secondary education and bachelor education.

Here the trend is more clear, both education levels have positive coefficient with LGBTQ acceptance scores. And if we look at the countries with more than 80% 25+ population received at least secondary education, there is only 1 country with negative acceptance score, while all other 18 countries are all LGBTQ-friendly.


End of the story

Finally, the story is coming to the end, and we want to make several conclusions:

Firstly, people’s attention towards the LGBTQ community is not enough, lots of efforts are needed to build a more caring world.

Secondly, in general, most people still have a negative attitude towards the LGBTQ group. However, the situation is getting better, which is a good sign.

Thirdly, based on the topic analysis around Quotebank, there still exists plenty of disparities in mental and physical health that LGBTQ groups would face, several efforts are required to improve current situations.

Then We find two potential factors that can be used to improve this situation: 1. most regions in the world with higher GDP level have LGBTQ acceptance level. We assume that, countries with better finantial situation are more likely to establish policies that could help build a LGBTQ-friendly society. 2. people from countries with higher average education levels will treat LGBTQ people more friendly. This means that education can play an important role in helping improve the situation of the LGBTQ community, such as the LGBTQ-involvement education that has been more and more adopted by countries to educate the new generations for the awareness of equatily and diversity.

We are aware that even with these discoveries, there is no quick fix for this problem. Still, we hope that these findings will serve as a source of inspiration. In a word, we are all involved in this story, and it’s us who could choose to make a good ending for it.”.


Guosheng Feng

Yifeng Chen

Duo Xu

Aibin Yu

Copyright © decls website 2021