“Be careful what you wish for, lest it come true!”— Aesop’s Fables
Often, shortly after I think something is a good idea, some sort of relevant cautionary tale will manifest. And then I find myself adding more caveats or narrowing my previously held views on how the world functions. Reality burns theory on contact and reveals horrifying wonders in what remains amid the ashes.
Tl;dr: Twitter uploaded what appears to be parts of its content ranking algorithm to Github and it seems that everyone has found something in it to support what they already wanted to think about it. Publishing the code is not enough.
Stronger artificial-intelligence (AI) chatbots using large language models (LLMs) are being dumped in front of users at an increasingly rapid clip. There are scads of dilemmas around this, from the user safety and privacy perspective, security vantage points, copyright and intellectual property issues, and even those “Big Question” existential quandaries on the nature of human creativity and identity are being triggered, or at least the publishing industry thinks so.
I’ve got another rambling blog post kicking around in my head about these things. Maybe one day it will find its way into a the wysiwyg and I’ll actually hit the publish button on it. But we’re not there yet.
For now, here are some musings on releasing commercial product algorithms into the open source wild. One of the key things many people in Tech Ethics circles want from these AI tools being rolled out by OpenAI, Google and Microsoft is that they release their algorithms to the public. I’m one of them… with a growing mental list of conditions. If I was Marina Hyde, I’d say here’s my bit on all this, but I’m setting the bar too high, now. More like, here’s a few words of caution on what can and will go wrong if we simply demand that an openly accessible algorithm will lead to transparency as opposed to more confusion, ire and recriminations.
So, just the other day I opined that my own preference would be to legally require open access to the algorithms and LLM data sources (if not the data) that inform them in these newly emerging advanced chatbots. It’s still possible to monetise a service on a platform in a number of ways, but there should be no secret sauce when it comes to how we’re being targeted with information. Leave it to Elon Musk’s erratic Twitter management style to put that notion to the test. At least sort of.
Elon has long said he’d open source Twitter’s algorithms when he took over the platform. This month on Github, he kind of followed through with it, publishing some of its source code for the recommendation algorithm that determines what Twitter users end up seeing or not seeing on the platform, and how their own content may be scored. This came through two repositories:
It’s so far been a good case study in what can happen if we leave it up to commercial platforms to decide which parts of the curtain they’ll pull back. It’s also been revealing in how people choose to interpret it. It hasn’t changed my mind, but it’s shown that a lot more is needed if people are going to be able to have a coherent conversation about what they’re looking at.
Both repositories make up aspects of Twitter’s recommendation algorithm. From a superficial perusal, people like Aakash Gupta have posted some finds that may not be very surprising: Likes, retweets and replies give a tweet a boost. The algorithm also seems to like images and videos. You won’t be shocked to learn that being a paid Twitter Blue user will put someone in the algorithm’s good books. Meanwhile, getting muted or blocked means you get down-graded, as do unfollows and spam or abuse reports.The thing most people use Twitter for — links to external sites — isn’t viewed favourably by the algorithm.
There are a few other bits and pieces that I think most users of the platform have safely assumed were happening. For example: the algorithm thinks you’re less valuable if you’re following more than being followed. It may be right. Most of us — myself included on a number of the algorithm’s indicators — are not terribly fascinating from the attention-capture point of view of what makes people stay on Twitter. And that’s the whole point of it: keeping people there. If you’re being honest with yourself and your 37 followers, one of which is your mom and another that’s your furries and anime fan sock puppet account, you already knew this.
But there are some less-clear questions, one of which I think deserves a mention: Does Twitter algorithmically downgrade news, political speech, or views on certain issues? Being “shadow banned” or hidden on Twitter user’s timeline has alleged in nearly every corner of the two-axis political compass chart. It was even a matter for the U.S. Congress. The issue predates Elon’s haphazard takeover and disastrous run at managing the platform. He was even one of those alleging that Twitter had an algorithmic political bent to the left. We can argue whether it did or does or not, but is this evidenced by what Twitter has released on Github?
Spoiler: the answer is “no.”
That’s not to say Twitter doesn’t have a bias, I think it does. Several, in fact. It’s bias is whatever the brain worms are directing Elon to think on any given day. It’s bias is still lurking in the data Twitter hasn’t released that guides the algorithm. It’s in how whatever remains of the skeleton crew Elon hasn’t yet fired is interpreting the different moderation tags the algorithm is slapping on content as fast as they can against a tsunami of incoming posts needing snap decisions. But none of that is available in Twitter’s Github repos.
Now, let’s talk about Ukraine. A number of people, including a few pretty smart ones, have suggested that the algorithm Elon’s published shows that Twitter down-ranks tweets about Russia’s war against Ukraine, and many of them coyly imply that the tweets being down-ranked are the ones more sympathetic to Ukraine. This charge relies on two lines in the PublicInterestRules.scala file in the the-algorithm repository, and another file about Twitter Spaces that seems to no longer be visible on Twitter’s Github page.
It certainly looks suspicious to have Ukraine as the only country mentioned in rows of obviously sinister looking categories. What’s it doing there? We might easily assume that a tweet smacked with
AbuseGlorificationOfViolence is going to be censored by the algorithm. We would hope something that’s actually
AbusePolicyViolentSexualConduct would be removed. If we decide that the algorithm is actually doing that, then we might think that this is happening with content about Ukraine as well. What’s the algorithm doing? How is it determining which information about Ukraine counts as misinformation, and is a policy violation?
Even if we thought it was there to flag Russian propaganda or fake news about attacks, or to allege something dubious that may have geopolitical ramifications, other algorithm files seem like a more natural home for it. For example, it could have been tucked cozily into the TweetSafetyLabel.scala file, which is loaded with what looks like active and depreciated tags around misinformation, elections and mentions France, Philippines, Brazil and other countries, as well as what looks like a lot of testing. (By the way, the file names make absolutely no sense, it seems like tags or labels, and flagging instructions and scoring are thrown around higgledy-piggledy. This is what a move-fast-and-break-things code base looks like, but I digress.)
Then there is the erratic billionaire man-baby himself, Elon. It’s easy to see why people wouldn’t exactly trust him on this topic. He’s opened the floodgates on Kremlin linked Twitter accounts returning to his platform and has personally spread Russian government propaganda about Crimea. Since taking over Twitter, Elon fired the very people whose jobs were to counter disinformation operations by Russian and other troll brigades. Elon’s now disgraced “Twitter files” campaign — which gave hack Matt Taibbi a short-lived side gig after finding himself unemployable as a journalist anywhere — seemed especially curated to appeal to the Q-anon-prone, and that weird horseshoe alliance of far-left / far-right howlers who can recite every exchange Glenn Greenwald has ever had with Tucker Carlson, and who converge on the faith in Putin being the messiah for the ever looming “multipolar world.” ( An aside: There is not a tankie out there that doesn’t secretly hope that in Elon Musk they can build their own George Soros.)
By now my own biases should be pretty clear. Ukraine needs to win this war, as One Ukraine, complete; Elon needs to have all his toys taken away (particularly SpaceX, give it to an adult). That doesn’t mean the algorithm file with Ukraine in it is necessarily doing anything nefarious.
To understand why we can’t really determine that Ukraine-related content is automatically being down-graded, let’s briefly look at another file, HomeGlobalParams.scala. In it we see that paying Elon for Twitter Blue can earn you favours from the algorithm. There is a listed quantifiable weight given to the attribute of having an $8 blue check.
On the opposite end, there are files in the repository that show how Twitter’s algorithm devalues some tweets. SpamVectorScoringFunction.java looks to be trying to do what it says on the tin, ranking tweets lower if they have links that don’t tick enough “reputable” boxes.
Meanwhile, the two Ukraine related strings in PublicInterestRules.scala look like:
PolicyInViolation.AbusePolicyUkraineCrisisMisinformation -> MisinfoCrisis, ... MisinfoCrisis -> PolicyInViolation.AbusePolicyUkraineCrisisMisinformation,
Basically, it’s tying the abuse policy string to the
MisinfoCrisis flag, but there’s no scoring or value attached. Instead the flag seems to trigger other criteria over in Actions.scale. Here, you can see all kinds of rules. Some of them instantly tossing some types of offending tweets straight into the bin. But not necessarily
MisinfoCrisis. That one seems to be instead flowing into yet another set of rules and criteria. And on it goes, the rabbit hole goes on a while.
What appears to be happening is that the Ukraine Misinformation tag was added at some point after Russia invaded, and about five or six months before Elon was forced to make good on this threat to buy Twitter. This would have been done in order to implement a newly announced moderation policy regarding misinformation in the conflict, which went into effect around April last year.
Essentially, a Twitter engineer was tasked with adding a Ukraine misinformation flag to the algorithm in order priorities content to be more closely scrutinised for potential misinformation moderation, if it failed the algorithm’s other tests. It doesn’t really indicate that Ukraine conflict news was being automatically downgraded wholesale at the time. There’s no telling what Elon’s doing now. But what looks like a task requiring five minutes or less of an engineer’s time around this month last year has now led to lazy reporting and more than a few Twitter threads of wild-eyed speculation, some by people who should really know better.
One very popular Twitter mention about Ukraine ranking was in the thread mentioned above, by Aakash. This one claimed: “Anything that is categorized as misinformation gets the rug pulled out from under it. Surprisingly, so are posts about Ukraine.” Anne Applebaum, whom I usually admire, quote tweeted Aakash’s dodgy take, adding: “Waiting for authors of ‘Twitter Files’ to write outraged posts about Twitter’s suppression of news about Ukraine.” Several others retweeted it as well, or echoed the same sentiment.
One problem with Aakash’s look at it though, was that he wasn’t citing the algorithm for tweets at all, but at something for Twitter Spaces, and maybe still not too accurately. Yet it still made the headlines for a news cycle this month, some suggesting news about the war was being demoted, others saying Twitter Spaces were being hidden. Again, there’s no substantial evidence to suggest either was the case. Elon doesn’t help matters. Any journalists reaching out for a chat with Twitter about the story just receive a “💩” as an auto reply from the de-populated communications team’s now pointless email address.
So, what are our takeaways from the whole dumb saga?
- Algorithms are all over the news, but often not understood. Cambridge Analytica showed how Facebook’s algorithm can be manipulated for political ends, even if the manipulators can’t actually see it first hand. Shoshana Zuboff popularised ‘Surveillance Capitalism’ and the endless ways we’re being manipulated by closed systems oozing from our own devices. Since ChatGPT’s launch in November last year, there’s been a deluge of hot and not-so-hot takes on how algorithms should be regulated, open sourced, controlled or even banned. But when an algorithm is made available, people need to look at them, or there’s no point. It’s easy to be paranoid, you don’t need open source code for that. It’s something else to scroll through Scala files and develop a coherent idea about what they mean (particularly when they’re incomplete). I prefer the random panic when people don’t have access to the source material. At least then we know they’re just guessing.
- “Who are they and why are they talking to you?” It’s one of those basic Journalism 101 lessons that seems to have been wiped off the curriculum with the acceptance of single-source leak/disclosure coverage and the notion that you can just release the files and let the public decide. Elon has gamed the concept twice. The first time was with the so-called “Twitter Files.” The second has been this alleged open sourcing of an algorithm (which was incomplete and is already out of date). In both cases these have been partial releases, crafted to support a narrative of Elon’s choosing. People took the bait each time, and learned zilch. Knock it off.
- We can’t trust that a platform’s corporate owner will necessarily release a proprietary algorithm of its own volition, or in good faith. Capitalism can only tolerate so much transparency, and just like food companies that lobbied against listing ingredients on packages for nearly a century, Big Tech loves keeping its sauce secret. From Twtter we’ve been given something that would be like a partial recipe, for a cake maybe. There’s a nice, glossy photo of what the finished item should look like, and some ingredients are listed. We’re missing the measurements and baking instructions. Currently, the only true value of Twitter’s algorithm repositories on Github is in Issue 474, where people are submitting pictures of their cats.
Until next time, read the docs and Slava Ukraine.