in

Decoding the US Senate Listening to on Oversight of AI: NLP Evaluation in Python | by Raul Vizcarra Chirinos


Photograph by Harold Mendoza on Unsplash

Phrase frequency evaluation, visualization and sentiment scores utilizing the NLTK toolkit

Raul Vizcarra Chirinos
Towards Data Science

Final Sunday morning, as I used to be switching TV channels looking for one thing to observe whereas having breakfast, I stumbled upon a replay of the Senate Listening to on Oversight of AI. It had solely been 40 minutes because it began, so I made a decision to observe the remainder of it (Speak about an fascinating method to spend a Sunday morning!).

When occasions just like the Senate Judiciary Subcommittee Listening to on Oversight of AI happen and also you need to atone for the important thing takeaways, you have got 4 choices: witness it stay, search for future recordings (each choices would require three hours of your life); learn the written model (transcripts), that are about 79 pages lengthy and over 29,000 phrases; or learn critiques on web sites or social media to get completely different opinions and kind your individual ( if it’s not from others).

These days, with every thing shifting so rapidly and our days feeling too brief, it’s tempting to go for the shortcut and depend on critiques as an alternative of going to the unique supply (I’ve been there too). In the event you select the shortcut for this listening to, it’s extremely possible that almost all critiques you’ll discover on the net or social media give attention to OpenAI CEO Sam Altman’s name for regulating AI. Nonetheless, after watching the listening to, I felt there was extra to discover past the headlines.

So, after my Sunday funday morning exercise, I made a decision to obtain the Senate Listening to transcript and use the NLTK Bundle (a Python package deal for pure language processing — NLP) to research it, evaluate most used phrases and apply some sentiment scores throughout completely different teams of curiosity (OpenAI, IBM, Academia, Congress) and see what may very well be between the strains. Spoiler alert! Out of the 29,000 phrases analyzed, solely 70 (0.24%) have been associated to phrases like regulation, regulate, regulatory, or laws.

It’s vital to notice that this text is just not about my takeaways from these AI listening to or Mr. ChatGPT Sam Altman. As an alternative, it focuses on what lies beneath the phrases of every a part of society (Non-public, Academia, Authorities) represented on this session below the roof of Capitol Hill, and what we are able to study from these phrases mixing with one another.

Contemplating that the subsequent few months are fascinating occasions for the way forward for regulation on Synthetic Intelligence, as the ultimate draft of the EU AI Act awaits debate within the European Parliament (anticipated to happen in June), it’s value exploring what’s behind the discussions surrounding AI on this aspect of the Atlantic.

STEP-01: GET THE DATA

I used the transcript printed by Justin Hendrix in Tech Coverage Press (accessible right here).

Entry the Senate Listening to transcript right here

Whereas Hendrix mentions it’s a fast transcript and suggests confirming quotes by watching the Senate Listening to video, I nonetheless discovered it to be fairly correct and fascinating for this evaluation. If you wish to watch the Senate Listening to or learn the testimonies of Sam Altman (Open AI), Christina Montgomery (IBM), and Gary Marcus (Professor at New York College), you could find them right here.

Initially, I deliberate to repeat the transcript to a Phrase doc and manually create a desk in Excel with the individuals’ names, their representing organizations, and their feedback. Nonetheless, this method was time-consuming and inefficient. So, I turned to Python and uploaded the complete transcript from a Microsoft Phrase file into a knowledge body. Right here is the code I used:

# STEP 01-Learn the Phrase doc
# keep in mind to put in pip set up python-docx

import docx
import pandas as pd

doc = docx.Doc('D:....your phrase file on microsoft phrase')

gadgets = []
names = []
feedback = []

# Iterate over paragraphs
for paragraph in doc.paragraphs:
textual content = paragraph.textual content.strip()

if textual content.endswith(':'):
identify = textual content[:-1]
else:
gadgets.append(len(gadgets))
names.append(identify)
feedback.append(textual content)

dfsenate = pd.DataFrame({'merchandise': gadgets, 'identify': names, 'remark': feedback})

# Take away rows with empty feedback
dfsenate = dfsenate[dfsenate['comment'].str.strip().astype(bool)]

# Reset the index
dfsenate.reset_index(drop=True, inplace=True)
dfsenate['item'] = dfsenate.index + 1
print(dfsenate)

The output ought to appear to be this:

 merchandise identify remark
0 1 Sen. Richard Blumenthal (D-CT) Now for some introductory remarks.
1 2 Sen. Richard Blumenthal (D-CT) “Too usually we've seen what occurs when expertise outpaces regulation, the unbridled exploitation of non-public information, the proliferation of disinformation, and the deepening of societal inequalities. We now have seen how algorithmic biases can perpetuate discrimination and prejudice, and the way the dearth of transparency can undermine public belief. This isn't the long run we wish.”
2 3 Sen. Richard Blumenthal (D-CT) In the event you have been listening from house, you may need thought that voice was mine and the phrases from me, however in truth, that voice was not mine. The phrases weren't mine. And the audio was an AI voice cloning software program skilled on my flooring speeches. The remarks have been written by ChatGPT when it was requested how I'd open this listening to. And also you heard simply now the consequence I requested ChatGPT, why did you decide these themes and that content material? And it answered. And I’m quoting, Blumenthal has a robust file in advocating for shopper safety and civil rights. He has been vocal about points equivalent to information privateness and the potential for discrimination in algorithmic choice making. Due to this fact, the assertion emphasizes these features.
3 4 Sen. Richard Blumenthal (D-CT) Mr. Altman, I respect ChatGPT’s endorsement. In all seriousness, this obvious reasoning is fairly spectacular. I'm positive that we’ll look again in a decade and examine ChatGPT and GPT-4 like we do the primary mobile phone, these huge clunky issues that we used to hold round. However we acknowledge that we're on the verge, actually, of a brand new period. The audio and my enjoying, it might strike you as curious or humorous, however what reverberated in my thoughts was what if I had requested it? And what if it had offered an endorsement of Ukraine, surrendering or Vladimir Putin’s management? That might’ve been actually scary. And the prospect is greater than somewhat scary to make use of the phrase, Mr. Altman, you have got used your self, and I feel you have got been very constructive in calling consideration to the pitfalls in addition to the promise.
4 5 Sen. Richard Blumenthal (D-CT) And that’s the rationale why we needed you to be right here at the moment. And we thanks and our different witnesses for becoming a member of us for a number of months. Now, the general public has been fascinated with GPT, dally and different AI instruments. These examples just like the homework carried out by ChatGPT or the articles and op-eds, that it might probably write really feel like novelties. However the underlying development of this period are extra than simply analysis experiments. They're now not fantasies of science fiction. They're actual and current the guarantees of curing most cancers or growing new understandings of physics and biology or modeling local weather and climate. All very encouraging and hopeful. However we additionally know the potential harms and we’ve seen them already weaponized disinformation, housing discrimination, harassment of ladies and impersonation, fraud, voice cloning deep fakes. These are the potential dangers regardless of the opposite rewards. And for me, maybe the largest nightmare is the looming new industrial revolution. The displacement of hundreds of thousands of staff, the lack of large numbers of jobs, the necessity to put together for this new industrial revolution in talent coaching and relocation which may be required. And already trade leaders are calling consideration to these challenges.
5 6 Sen. Richard Blumenthal (D-CT) To cite ChatGPT, this isn't essentially the long run that we wish. We have to maximize the great over the dangerous. Congress has a selection. Now. We had the identical selection once we face social media. We did not seize that second. The result's predators on the web, poisonous content material exploiting kids, creating risks for them. And Senator Blackburn and I and others like Senator Durbin on the Judiciary Committee try to cope with it within the Children On-line Security Act. However Congress failed to fulfill the second on social media. Now we've the duty to do it on AI earlier than the threats and the dangers change into actual. Smart safeguards usually are not in opposition to innovation. Accountability is just not a burden removed from it. They're the inspiration of how we are able to transfer forward whereas defending public belief. They're how we are able to lead the world in expertise and science, but additionally in selling our democratic values.
6 7 Sen. Richard Blumenthal (D-CT) In any other case, within the absence of that belief, I feel we could nicely lose each. These are refined applied sciences, however there are fundamental expectations widespread in our regulation. We are able to begin with transparency. AI firms should be required to check their techniques, disclose identified dangers, and permit unbiased researcher entry. We are able to set up scorecards and diet labels to encourage competitors based mostly on security and trustworthiness, limitations on use. There are locations the place the chance of AI is so excessive that we ought to limit and even ban their use, particularly with regards to business invasions of privateness for revenue and selections that have an effect on folks’s livelihoods. And naturally, accountability, reliability. When AI firms and their purchasers trigger hurt, they need to be held liable. We should always not repeat our previous errors, for instance, Part 230, forcing firms to assume forward and be accountable for the ramifications of their enterprise selections will be probably the most highly effective software of all. Rubbish in, rubbish out. The precept nonetheless applies. We should watch out for the rubbish, whether or not it’s going into these platforms or popping out of them.

Subsequent, I thought of including some labels for future analyis, figuring out the people by the phase of society they represented


def assign_sector(identify):
if identify in ['Sam Altman', 'Christina Montgomery']:
return 'Non-public'
elif identify == 'Gary Marcus':
return 'Academia'
else:
return 'Congress'

# Apply perform
dfsenate['sector'] = dfsenate['name'].apply(assign_sector)

# Assign organizations based mostly on names
def assign_organization(identify):
if identify == 'Sam Altman':
return 'OpenAI'
elif identify == 'Christina Montgomery':
return 'IBM'
elif identify == 'Gary Marcus':
return 'Academia'
else:
return 'Congress'

# Apply perform
dfsenate['Organization'] = dfsenate['name'].apply(assign_organization)

print(dfsenate)

Lastly, I made a decision so as to add a column that counts the phrases from every assertion, which may assist us additionally for additional evaluation.

dfsenate['WordCount'] = dfsenate['comment'].apply(lambda x: len(x.break up()))

At this half, your dataframe ought to appear to be this:

   merchandise                            identify  ... Group WordCount
0 1 Sen. Richard Blumenthal (D-CT) ... Congress 5
1 2 Sen. Richard Blumenthal (D-CT) ... Congress 55
2 3 Sen. Richard Blumenthal (D-CT) ... Congress 125
3 4 Sen. Richard Blumenthal (D-CT) ... Congress 145
4 5 Sen. Richard Blumenthal (D-CT) ... Congress 197
.. ... ... ... ... ...
399 400 Sen. Cory Booker (D-NJ) ... Congress 156
400 401 Sam Altman ... OpenAI 180
401 402 Sen. Cory Booker (D-NJ) ... Congress 72
402 403 Sen. Richard Blumenthal (D-CT) ... Congress 154
403 404 Sen. Richard Blumenthal (D-CT) ... Congress 98

STEP-02: VISUALIZE THE DATA

Let’s check out the numbers we’ve up to now: 404 questions or testimonies and virtually 29,000 phrases. These numbers give us the fabric we have to get began. It’s vital to know that some statements have been break up into smaller components. When there have been lengthy statements with completely different paragraphs, the code divided them into separate statements, despite the fact that they have been really a part of one contribution. To get a greater understanding of every participant’s involvement, I additionally take into account the variety of phrases they used. This gave one other perspective on their engagement.

Listening to on Oversight of AI: Determine 01

As you possibly can see in Determine 01, interventions by members of Congress represented greater than half of all of the hearings, adopted by Sam Altman’s testimony. Nonetheless, an alternate view obtained by counting the phrases from all sides exhibits a extra balanced illustration between Congress (11 members) and the panel composed of Altman (OpenAI), Montgomery (IBM), and Marcus (Academia).

It’s fascinating to notice the completely different ranges of engagement among the many members of Congress who participated within the Senate listening to (View desk under) . As anticipated, Sen. Blumenthal, because the Subcommittee Chair, was extremely engaged. However what in regards to the different members? The desk exhibits important variations in engagement amongst all eleven individuals. Bear in mind, the amount of contributions doesn’t essentially point out their high quality. I’ll allow you to do your individual judgement when you overview the numbers.

Lastly, despite the fact that Sam Altman obtained a number of consideration, it’s value noting that Gary Marcus, regardless of it might seem that he had few participation alternatives, had loads to say, as indicated by his phrase depend, which is analogous to Altman’s. Or is it possibly as a result of academia usually offers detailed explanations, whereas the enterprise world prefers practicality and simplicity?

Alright, professor Marcus, when you may very well be particular. That is your shot, man. Speak in plain English and inform me what, if any guidelines we should implement. And please don’t simply use ideas. I’m on the lookout for specificity.

Sen. John Kennedy (R-LA). US Senate Listening to on Oversight of AI ( 2023)

#*****************************PIE CHARTS************************************
import pandas as pd
import matplotlib.pyplot as plt

# Pie chart - Grouping by 'Group' Questions&Testimonies
org_colors = {'Congress': '#6BB6FF', 'OpenAI': 'inexperienced', 'IBM': 'lightblue', 'Academia': 'lightyellow'}
org_counts = dfsenate['Organization'].value_counts()

plt.determine(figsize=(8, 6))
patches, textual content, autotext = plt.pie(org_counts.values, labels=org_counts.index,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(org_counts.values) / 100)})',
startangle=90, colours=[org_colors.get(org, 'gray') for org in org_counts.index])
plt.title('Listening to on Oversight of AI: Questions or Testimonies')
plt.axis('equal')
plt.setp(textual content, fontsize=12)
plt.setp(autotext, fontsize=12)
plt.present()

# Pie chart - Grouping by 'Group' (WordCount)
org_colors = {'Congress': '#6BB6FF', 'OpenAI': 'inexperienced', 'IBM': 'lightblue', 'Academia': 'lightyellow'}
org_wordcount = dfsenate.groupby('Group')['WordCount'].sum()

plt.determine(figsize=(8, 6))
patches, textual content, autotext = plt.pie(org_wordcount.values, labels=org_wordcount.index,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(org_wordcount.values) / 100)})',
startangle=90, colours=[org_colors.get(org, 'gray') for org in org_wordcount.index])

plt.title('Listening to on Oversight of AI: WordCount ')
plt.axis('equal')
plt.setp(textual content, fontsize=12)
plt.setp(autotext, fontsize=12)
plt.present()

#************Engagement among the many members of Congress**********************

# Group by identify and depend the rows
Summary_Name = dfsenate.groupby('identify').agg(comment_count=('remark', 'dimension')).reset_index()

# WordCount column for every identify
Summary_Name ['Total_Words'] = dfsenate.groupby('identify')['WordCount'].sum().values

# Share distribution for comment_count
Summary_Name ['comment_count_%'] = Summary_Name['comment_count'] / Summary_Name['comment_count'].sum() * 100

# Share distribution for total_word_count
Summary_Name ['Word_count_%'] = Summary_Name['Total_Words'] / Summary_Name['Total_Words'].sum() * 100

Summary_Name = Summary_Name.sort_values('Total_Words', ascending=False)

print (Summary_Name)
+-------+--------------------------------+---------------+-------------+-----------------+--------------+
| index | identify | Interventions | Total_Words | Interv_% | Word_count_% |
+-------+--------------------------------+---------------+-------------+-----------------+--------------+
| 2 | Sam Altman | 92 | 6355 | 22.77227723 | 22.32252626 |
| 1 | Gary Marcus | 47 | 5105 | 11.63366337 | 17.93178545 |
| 15 | Sen. Richard Blumenthal (D-CT) | 58 | 3283 | 14.35643564 | 11.53184165 |
| 10 | Sen. Josh Hawley (R-MO) | 25 | 2283 | 6.188118812 | 8.019249008 |
| 0 | Christina Montgomery | 36 | 2162 | 8.910891089 | 7.594225298 |
| 6 | Sen. Cory Booker (D-NJ) | 20 | 1688 | 4.95049505 | 5.929256384 |
| 7 | Sen. Dick Durbin (D-IL) | 8 | 1143 | 1.98019802 | 4.014893393 |
| 11 | Sen. Lindsey Graham (R-SC) | 32 | 880 | 7.920792079 | 3.091081527 |
| 5 | Sen. Christopher Coons (D-CT) | 6 | 869 | 1.485148515 | 3.052443008 |
| 12 | Sen. Marsha Blackburn (R-TN) | 14 | 869 | 3.465346535 | 3.052443008 |
| 4 | Sen. Amy Klobuchar (D-MN) | 11 | 769 | 2.722772277 | 2.701183744 |
| 13 | Sen. Mazie Hirono (D-HI) | 7 | 755 | 1.732673267 | 2.652007447 |
| 14 | Sen. Peter Welch (D-VT) | 11 | 704 | 2.722772277 | 2.472865222 |
| 3 | Sen. Alex Padilla (D-CA) | 7 | 656 | 1.732673267 | 2.304260775 |
+-------+--------------------------------+---------------+-------------+-----------------+--------------+

STEP-03: TOKENIZATION

Right here is the place the pure language processing (NLP) enjoyable begins. To research the textual content, we’ll use the NLTK Bundle in Python. It offers helpful instruments for phrase frequency evaluation and visualization. The next libraries and modules would offer the required instruments for phrase frequency evaluation and visualization.


#pip set up nltk
#pip set up spacy
#pip set up wordcloud
#pip set up subprocess
#python -m spacy obtain en

First, we’ll begin with Tokenization, which suggests breaking the textual content into particular person phrases, often known as “tokens.” For this, we’ll use spaCy, an open-source NLP library that may deal with contractions, punctuation, and particular characters. Subsequent, we’ll take away widespread phrases that don’t add a lot which means, like “a,” “an,” “the,” “is,” and “and,” utilizing the cease phrase useful resource from the NLTK library. Lastly, we’ll apply Lemmatization which reduces phrases to their base kind, often called the lemma. For instance, “working” turns into “run” and “happier” turns into “pleased.” This system helps us work with the textual content extra successfully and perceive its which means.

To summarize:

o Tokenize the textual content.

o Take away widespread phrases.

o Apply Lemmatization.

#***************************WORD-FRECUENCY*******************************

import subprocess
import nltk
import spacy
from nltk.chance import FreqDist
from nltk.corpus import stopwords

# Obtain sources
subprocess.run('python -m spacy obtain en', shell=True)
nltk.obtain('punkt')

# Load spaCy mannequin and set stopwords
nlp = spacy.load('en_core_web_sm')
stop_words = set(stopwords.phrases('english'))

def preprocess_text(textual content):
phrases = nltk.word_tokenize(textual content)
phrases = [word.lower() for word in words if word.isalpha()]
phrases = [word for word in words if word not in stop_words]
lemmas = [token.lemma_ for token in nlp(" ".join(words))]
return lemmas

# Combination phrases and create Frecuency Distribution
all_comments=" ".be part of(dfsenate['comment'])
processed_comments = preprocess_text(all_comments)
fdist = FreqDist(processed_comments)

#**********************HEARING TOP 30 COMMON WORDS*********************
import matplotlib.pyplot as plt
import numpy as np

# Most typical phrases and their frequencies
top_words = fdist.most_common(30)
phrases = [word for word, freq in top_words]
frequencies = [freq for word, freq in top_words]

# Bar plot-Listening to on Oversight of AI:Prime 30 Most Frequent Phrases
fig, ax = plt.subplots(figsize=(8, 10))
ax.barh(vary(len(phrases)), frequencies, align='heart', shade="skyblue")

ax.invert_yaxis()
ax.set_xlabel('Frequency', fontsize=12)
ax.set_ylabel('Phrases', fontsize=12)
ax.set_title('Listening to on Oversight of AI:Prime 30 Most Frequent Phrases', fontsize=14)
ax.set_yticks(vary(len(phrases)))
ax.set_yticklabels(phrases, fontsize=10)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_linewidth(0.5)
ax.spines['bottom'].set_linewidth(0.5)
ax.tick_params(axis="x", labelsize=10)
plt.subplots_adjust(left=0.3)

for i, freq in enumerate(frequencies):
ax.textual content(freq + 5, i, str(freq), va="heart", fontsize=8)

plt.present()

Listening to on Oversight of AI: Determine 02

As you possibly can see within the bar plot (Figur 02) , there was a number of “Considering”. Perhaps the primary 5 phrases give us an fascinating trace of what we must always do at the moment and for our future by way of AI:

“We want to assume and know the place AI ought to go”.

As I discussed in the beginning of this text, at first sight, “regulation” doesn’t stand out as a ceaselessly used phrase within the Senate AI Listening to. Nonetheless, concluding that it wasn’t a subject of major concern may very well be inaccurate . The curiosity in whether or not AI ought to or shouldn’t be regulated was expressed in numerous phrases equivalent to “regulation”, “regulate”, “company” or “regulatory”. Due to this fact, lets make some changes to the code, combination these phrases, and re-run the bar plot to see the way it impacts the evaluation.

nlp = spacy.load('en_core_web_sm')
stop_words = set(stopwords.phrases('english'))

def preprocess_text(textual content):
phrases = nltk.word_tokenize(textual content)
phrases = [word.lower() for word in words if word.isalpha()]
phrases = [word for word in words if word not in stop_words]
lemmas = [token.lemma_ for token in nlp(" ".join(words))]
return lemmas

# Combination phrases and create Frecuency Distribution
all_comments=" ".be part of(dfsenate['comment'])
processed_comments = preprocess_text(all_comments)
fdist = FreqDist(processed_comments)
original_fdist = fdist.copy() # Save the unique object

aggregate_words = ['regulation', 'regulate','agency', 'regulatory','legislation']
aggregate_freq = sum(fdist[word] for phrase in aggregate_words)
df_aggregatereg = pd.DataFrame({'Phrase': aggregate_words, 'Frequency': [fdist[word] for phrase in aggregate_words]})

# Take away particular person phrases and add aggregation
for phrase in aggregate_words:
del fdist[word]
fdist['regulation+agency'] = aggregate_freq

# Pie chart for Regulation+company distribution
import matplotlib.pyplot as plt

labels = df_aggregatereg['Word']
values = df_aggregatereg['Frequency']

plt.determine(figsize=(8, 6))
plt.subplots_adjust(high=0.8, backside=0.25)

patches, textual content, autotext = plt.pie(values, labels=labels,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(values) / 100)})',
startangle=90, colours=['#6BB6FF', 'green', 'lightblue', 'lightyellow', 'gray'])

plt.title('Regulation+company: Distribution', fontsize=14)
plt.axis('equal')
plt.setp(textual content, fontsize=8)
plt.setp(autotext, fontsize=8)
plt.present()

Listening to on Oversight of AI: Determine 03

As you possibly can see in Determine-03, the subject of regulation was in spite of everything many occasions through the Senate AI Listening to.

STEP-04: WHAT HIDES BEHIND THE WORDS

Phrases alone could present us with some clues, however it’s the interconnection of phrases that actually gives us some perspective. So, let’s take an method utilizing phrase clouds to discover if we are able to uncover insights that can’t be proven by easy bar and pie charts.

# Phrase cloud-Senate Listening to on Oversight of AI
from wordcloud import WordCloud
wordcloud = WordCloud(width=800, peak=400, background_color="white").generate_from_frequencies(fdist)
plt.determine(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Phrase Cloud - Senate Listening to on Oversight of AI')
plt.present()
Listening to on Oversight of AI: Determine 04

Let’s discover additional and evaluate the phrase clouds for the completely different teams of curiosity represented within the AI Listening to (Non-public, Congress, Academia) and see in the event that they phrases reveal completely different views on the way forward for AI.

# Phrase clouds for every group of Curiosity
organizations = dfsenate['Organization'].distinctive()
for group in organizations:
feedback = dfsenate[dfsenate['Organization'] == group]['comment']
all_comments=" ".be part of(feedback)
processed_comments = preprocess_text(all_comments)
fdist_organization = FreqDist(processed_comments)

# Phrase clouds
wordcloud = WordCloud(width=800, peak=400, background_color="white").generate_from_frequencies(fdist_organization)
plt.determine(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
if group == 'IBM':
plt.title(f'Phrase Cloud: {group} - Christina Montgomery')
elif group == 'OpenAI':
plt.title(f'Phrase Cloud: {group} - Sam Altman')
elif group == 'Academia':
plt.title(f'Phrase Cloud: {group} - Gary Marcus')
else:
plt.title(f'Phrase Cloud: {group}')
plt.present()

Listening to on Oversight of AI: Determine 05

It’s fascinating how some phrases seem (or disappear) for every group of curiosity represented within the Senate AI Listening to whereas they discuss synthetic intelligence.

By way of the massive heading, “Sam Altman’s name for regulating AI” ; nicely, if he’s in favor of regulation or not, I actually can’t inform, but it surely doesn’t appear to have a lot regulation in its phrases to me.As an alternative, Sam Altman appears to have a people-centric method when he talks about AI, repeating phrases like “assume,” “folks,” “know,” “vital,” and “use,” and depends extra on phrases like “expertise” ,”system” or “mannequin” as an alternative of utilizing the phrase “AI”.

Somebody that did had one thing to say about “threat”, and “points” was Christina Montgomery (IBM) who repeated this phrases consistently when speaking about “expertise”, “firms” and “AI”. Attention-grabbing truth in her testimony, is discovering phrases that almost all of all anticipate to listen to from firms concerned in growing expertise ; “belief”, “governance” and “assume” what it’s “proper” by way of AI.

We have to maintain firms accountable at the moment and accountable for AI that they’re deploying…..

Christina Montgomery. US Senate Listening to on Oversight of AI ( 2023)

Gary Marcus in his preliminary assertion mentioned, ‘“I come as a scientist, somebody who’s based AI firms, and is somebody who genuinely loves AI…” So, for the sake of this NLP evaluation, we’re contemplating him as a illustration of the voice of Academia. Phrases like “want”, “assume”, “know”, “go” , “folks” stand out amongst others. An fascinating truth is that the phrase “system” appears to be repeated greater than “AI” in his testimony. Perhaps AI it’s not a single lone expertise that might change the long run, the impression on the long run will come from a number of applied sciences or techniques interacting with one another (IoT, robotics, BioTech, and many others.) reasonably than relying solely on certainly one of them.

On the finish, the primary speculation talked about by Senator John Kennedy appears not fully false in spite of everything (not only for Congress however for society as a complete). We’re nonetheless in that stage the place we try to know the route AI is heading.

Allow me to share with you three hypotheses that I would really like you to imagine for the second to be true. Speculation primary, many members of Congress don’t perceive synthetic intelligence. Speculation. Quantity two, that absence of understanding could not stop Congress from plunging in with enthusiasm and making an attempt to control this expertise in a manner that would damage this expertise. Speculation quantity three, that I would really like you to imagine there may be possible a berserk wing of the factitious intelligence neighborhood that deliberately or unintentionally may use synthetic intelligence to kill all of us and damage us the whole time that we’re dying…..

Sen. John Kennedy (R-LA). US Senate Listening to on Oversight of AI ( 2023)

STEP-05: THE EMOTION BEHIND YOUR WORDS

We’ll use the SentimentIntensityAnalyzer class from the NLTK library for sentiment evaluation. This pre-trained mannequin makes use of a lexicon-based method, the place every phrase within the lexicon (VADER) has a predefined sentiment polarity worth. The sentiment scores of the phrases in a chunk of textual content are aggregated to calculate an general sentiment rating. The numerical worth ranges from -1 (unfavourable sentiment) to +1 (optimistic sentiment), with 0 indicating a impartial sentiment. Constructive sentiment displays a positive emotion, angle, or enthusiasm, whereas unfavourable sentiment conveys an unfavorable emotion or angle.

#************SENTIMENT ANALYSIS************
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.obtain('vader_lexicon')

sid = SentimentIntensityAnalyzer()
dfsenate['Sentiment'] = dfsenate['comment'].apply(lambda x: sid.polarity_scores(x)['compound'])

#************BOXPLOT-GROUP OF INTEREST************
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('white')
plt.determine(figsize=(12, 7))
sns.boxplot(x='Sentiment', y='Group', information=dfsenate, shade="yellow",
width=0.6, showmeans=True, showfliers=True)

# Customise the axis
def add_cosmetics(title="Sentiment Evaluation Distribution by Group of Curiosity",
xlabel="Sentiment"):
plt.title(title, fontsize=28)
plt.xlabel(xlabel, fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine()

def customize_labels(label):
if "OpenAI" in label:
return label + "-Sam Altman"
elif "IBM" in label:
return label + "-Christina Montgomery"
elif "Academia" in label:
return label + "-Gary Marcus"
else:
return label

# Apply custom-made labels to y-axis
yticks = plt.yticks()[1]
plt.yticks(ticks=plt.yticks()[0], labels=[customize_labels(label.get_text())
for label in yticks])

add_cosmetics()
plt.present()

Listening to on Oversight of AI: Determine 06

A boxplot is all the time fascinating because it exhibits the minimal and most values, the median, the primary (Q1) and third (Q3) quartiles. As well as, a line of code was added to show the imply worth. (Acknowledgment to Elena Kosourova for designing the boxplot code template; I solely made changes for my dataset).

Total, everybody appeared to be in temper through the Senate Listening to, particularly Sam Altman, who stood out with the best sentiment rating, adopted by Christina Montgomery. Then again, Gary Marcus appeared to have a extra impartial expertise (median round 0.25) and he could have felt considerably uncomfortable at occasions, with values near 0 and even unfavourable. As well as, Congress as a complete displayed a left-skewed distribution in its sentiment scores, indicating an inclination in the direction of neutrality or positivity. Curiously, if we take a more in-depth look, sure interventions stood out with extraordinarily excessive or low sentiment scores.

Listening to on Oversight of AI: Determine 07

Perhaps we must always interpret the outcomes not as if folks within the Senate AIHearing have been pleased or uncomfortable. Perhaps this recommend that those that take part within the Listening to could not maintain a very optimistic view of the place AI is headed, however on the identical time, they don’t seem to be pessimistic both. The scores could point out that there are some considerations and are being cautious in regards to the route AI ought to take.

And what a few timeline? Did the temper through the listening to keep the identical all through? How did the temper of every group of curiosity evolve? To research the timeline, I organized the statements within the order they have been captured and carried out a sentiment evaluation. Since there are over 400 questions or testimonies, I outlined a shifting common of the sentiment scores for every group of curiosity ( Congress, Academia, Non-public) , utilizing a window dimension of 10. Which means the shifting common is calculated by averaging the sentiment scores over each 10 consecutive statements:

#**************************TIMELINE US SENATE AI HEARING**************************************

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline

# Shifting common for every group
window_size = 10
organizations = dfsenate['Organization'].distinctive()

# Create the road plot
color_palette = sns.color_palette('Set2', len(organizations))

plt.determine(figsize=(12, 6))
for i, org in enumerate(organizations):
df_org = dfsenate[dfsenate['Organization'] == org]

# shifting common
df_org['Sentiment'].fillna(0, inplace=True) # lacking values full of 0
df_org['Moving_Average'] = df_org['Sentiment'].rolling(window=window_size, min_periods=1).imply()

x = np.linspace(df_org.index.min(), df_org.index.max(), 500)
spl = make_interp_spline(df_org.index, df_org['Moving_Average'], ok=3)
y = spl(x)
plt.plot(x, y, linewidth=2, label=f'{org} {window_size}-Level Shifting Common', shade=color_palette[i])

plt.xlabel('Assertion Quantity', fontsize=12)
plt.ylabel('Sentiment Rating', fontsize=12)
plt.title('Sentiment Rating Evolution through the Listening to on Oversight of AI', fontsize=16)
plt.legend(fontsize=12)
plt.grid(shade="lightgray", linestyle="--", linewidth=0.5)
plt.axhline(0, shade="black", linewidth=0.5, alpha=0.5)

for org in organizations:
df_org = dfsenate[dfsenate['Organization'] == org]
plt.textual content(df_org.index[-1], df_org['Moving_Average'].iloc[-1], f'{df_org["Moving_Average"].iloc[-1]:.2f}', ha="proper", va="high", fontsize=12, shade="black")

plt.tight_layout()
plt.present()

Listening to on Oversight of AI: Determine 08

In the beginning, it appeared just like the session was pleasant and optimistic, with everybody discussing the way forward for AI. However because the session went on, the temper began to vary. The members of Congress grew to become much less optimistic, and their questions grew to become more difficult. This affected the panelists’ scores, with some even getting low scores (you possibly can see this in the direction of the tip of the session). Curiously, Altman was seen by the mannequin as impartial or barely optimistic, even through the tense moments with the members of Congress.

It’s vital to keep in mind that the mannequin has its limitations and will border on subjectivity. Whereas sentiment evaluation isn’t flawless, it gives us an fascinating glimpse into the depth of feelings that prevailed on that day in Capitol Hill.

In my view, the teachings behind this US Senate AI Listening to lie within the 5 most repeated phrases: “We want to assume and know the place AI ought to go. It’s noteworthy that phrases like “folks” and “significance” have been unexpectedly current in Sam Altman’s phrase cloud, going past the headline for a “Name for regulation”. Whereas I hoped to seek out extra phrases like “transparency”, “accountability”, “belief”, “governance”, and “equity” in Altman’s NLP evaluation, it was a aid to seek out a few of them ceaselessly repeated in Christina Montgomery’s testimony. That is what we’re all anticipating to listen to extra ceaselessly when AI is on the desk.

Gary Marcus emphasised “system” as a lot as “AI”, maybe inviting us to see Synthetic Intelligence in a broader context. A number of applied sciences are rising proper now, and their mixed impression on society, work, and employment sooner or later will come from the conflict of those a number of applied sciences, not simply from certainly one of them. Academia performs an important function in guiding this path, and if some type of regulation is required.I say this “actually” not “spiritually” (inside joke from the six-month moratorium letter).

Lastly, the phrase “Company” was repeated as a lot as “Regulation” in its completely different varieties. This means that the idea of an “Company for AI” and its function will possible be a subject of debate within the close to future. An fascinating reflection on this problem was talked about within the Senate AI Listening to by Sen. Richard Blumenthal:

…Most of my profession has been an enforcement. And I’ll inform you one thing, you possibly can create 10 new businesses, however when you don’t give them the sources, and I’m speaking not nearly {dollars}, I’m speaking about scientific experience, you guys will run circles round ’em. And it isn’t simply the, the fashions or the generative AI that may run fashions round run circles round them, however it’s the scientists in your firms. For each success story in authorities regulation, you possibly can consider 5 failures…. And I hope our expertise right here will probably be completely different…

Sen. Richard Blumenthal (D-CT). US Senate Listening to on Oversight of AI ( 2023)

Though reconciling innovation, consciousness, and regulation for me is difficult, I’m all for elevating consciousness about AI’s function in our current and future but additionally understanding that “analysis” and “improvement” are various things. The primary one ought to be inspired and promoted, not contained,the second is the place the additional effort within the “considering” and “figuring out” is required.

I hope you discovered this NLP evaluation fascinating and I need to thank Justin Hendrix and Tech Coverage Press for permitting me to make use of their transcript on this article. You’ll be able to entry the whole code on this GitHub repository. (Acknowledgement additionally to ChatGPT for serving to me fine-tune a few of my code for a greater presentation).

Did I miss something? Your options are all the time welcome and maintain the dialog going.


OCR and PDF Information Extraction in Zapier

Evaluating Kore.ai XO Platform vs Google Dialog Circulation – TechToday