Computational Propaganda
Exploring the World of Computational Propaganda and Graph Models
Issue no 1
Welcome to my issue no 1 newsletter where I share my thoughts on my data engineering journey. This edition focuses on my recent fixation, computational propaganda and graph models. As a self-proclaimed search junkie, I’ve been diving deep into this subject, and I’m excited to share my findings with you.
Thanks for reading Deep vision! Subscribe for free to receive new posts and support my work.
Tabulare data is strange
Tabulare data has always seemed like a strange concept to me. The idea of taking elements of the world and assuming their value is independent of other values is tempting but ultimately reductionist. With the rapid advancements in computational power and storage techniques, it’s becoming increasingly clear that we need a more representative model of the world that takes into account the relationships and connections between different elements.
My newfixation
Over the past few months, I’ve been exposing myself to a variety of techniques and technologies that allow me to work with representative data in a more relational and centered way, specifically graph and network models. In this newsletter, I’ve compiled a selection of resources on these topics and added a touch of digital propaganda, which is another theme that I’m passionate about.
I hope you enjoy this edition, and I look forward to your feedback as I continue to co-develop this newsletter. The format and structure are still a work in progress, but I’m eager to find a balance between my passions and your interests. So, let’s dive in and explore the world of computational propaganda and graph models together.
Ignore the rest
So if graph algorithms and computational social science are not something that you may be interested in, you may ignore the rest of this newsletter.
Resources
I will keep update these resources
Institut Polytechnique de Paris resaerchers
- Thomas Bonald Professor, at Institut Polytechnique de Paris
- Tiphaine Viard, Associate Professor, at Institut Polytechnique de Paris
- Simon Delarue, Ph.D. Candidate, at Institut Polytechnique de Paris
- Haileleul Haile, Masters Student at Institut Polytechnique de Paris
- Telecom Graph Mining course
scikit-network
Python package for the analysis of large graphs
Interesting tutorials/package :
Geometric graphs
A pedagogical resource for beginners and experts to explore the design space of Graph Neural Networks for geometric graphs.
A Gentle Introduction to Geometric Graph Neural Networks
practical notebook on Geometric GNNs 101, prepared for MPhil students at the University of Cambridge.
Graph-tool is an efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks). Contrary to most other Python modules with similar functionality, the core data structures and algorithms are implemented in C++, making extensive use of template metaprogramming, based heavily on the Boost Graph Library. This confers it a level of performance that is comparable (both in memory usage and computation time) to that of a pure C/C++ library.
Apache TinkerPop™
Apache TinkerPop™ is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
NetworKit
NetworKit is a growing open-source toolkit for large-scale network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These are meant to compute standard measures of network analysis, such as degree sequences, clustering coefficients, and centrality measures. In this respect, NetworKit is comparable to packages such as NetworkX, albeit with a focus on parallelism and scalability.
Boost
The Boost Graph Library (BGL)
the Boost Graph Library is a generic interface that allows access to a graph’s structure, but hides the details of the implementation. This is an “open” interface in the sense that any graph library that implements this interface will be interoperable with the BGL generic algorithms and with other algorithms that also use this interface. The BGL provides some general purpose graph classes that conform to this interface, but they are not meant to be the “only” graph classes; there certainly will be other graph classes that are better for certain situations. We believe that the main contribution of the The BGL is the formulation of this interface.
Interesting papers
Stream Graphs and Link Streams for the Modeling of Interactions over Time
- BERT with Entity Mapping for Propaganda Classification
This paper used shallow Natural Language Processing (NLP) preprocessing techniques to reduce the noise in the dataset, feature selection methods, and common supervised machine learning algorithms. The final model is based on using the BERT system with entity mapping. To improve our model’s accuracy, this paper mapped certain words into five distinct categories by employing word-classes and entity recognition.
- GNNExplainer: Generating Explanations for Graph Neural Networks
GNNEXPLAINER as an optimization task that maximizes the mutual information between a GNN’s prediction and distribution of possible subgraph structures. Experiments on synthetic and real-world graphs show that our approach can identify important graph structures as well as node features, and outperforms alternative baseline approaches by up to 43.0% in explanation accuracy. (+)
- A collection of knowledge graph papers, codes, and reading notes.
A collection of knowledge graph papers, codes, and reading notes.
- A Review of Relational Machine Learning for Knowledge Graphs
A review of how such statistical models can be
“trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph).
- Synthetic Graph Generation to Benchmark Graph Learning
This papaer propose to generate synthetic graphs, and study the behaviour of graph learning algorithms in a controlled scenario. It develop a fully featured synthetic graph generator that allows deep inspection of different models. We argue that synthetic graph generations allows for thorough investigation of algorithms and provides more insights than overfitting on three citation datasets.
- Text Generation from Knowledge Graphs with Graph Transformers
A novel graph transforming encoder which can leverage the relational structure of such knowledge graphs without imposing linearization or hierarchical constraints.Incorporated into an encoder-decoder setup, we provide an end-to-end trainable systemfor graph-to-text generation
- Synthetic Graph Generation to Benchmark Graph Learning
- Study the behaviour of graph
learning algorithms in a controlled scenario. This paper develop a fully-
featured synthetic graph generator that allows deep inspection of
different models. It argue that synthetic graph generations allows
for thorough investigation of algorithms and provides more insights
than overfitting on three citation datasets. In the case study, this study
show how our framework provides insight into unsupervised and
supervised graph neural network models.
- Node-Level Differentially Private Graph Neural Networks
This paper define the problem of learning GNN parameters with node-level privacy, and provide an algorithmic solution with a strong
differential privacy guarantee. This study employ a careful sensitivity analysis and provide a non-trivial extension of the privacy-by-amplification technique to the GNN setting. An empirical evaluation on standard benchmark datasets demonstrates that this method is indeed able to learn accurate privacy–preserving GNNs which outperform both private and non-private methods that completely ignore graph information.
- Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble
The task, Detection of Persuasion Techniques in Texts and Images, is to detect these persuasive techniques in memes. It consists of three subtasks: (A) Multi-label classification using textual content, (B) Multi-label classification and span identification using textual content, and © Multi-label classification using visual and textual content. In this pa-per, we propose a transfer learning approach to fine-tune BERT-based models in different modalities.
- Context-Aware Rich Feature Representations For Propaganda Classification
Detection of Propaganda Techniques in News Articles for each of the two subtasks of Span Identification and Technique Classification. We make use of pre-trained BERT language model enhanced with tagging techniques developed for the task of Named Entity Recognition (NER), to develop a system for identifying propaganda spans in the text. For the second subtask, we incorporate contextual features in a pre trained RoBERTa model for the classification of propaganda techniques. (+)
- Risks of Propaganda-As-A-Service and Countermeasures
An Investigatatiom on threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to “spin” their outputs so as to support an adversary chosen sentiment or point of view — but only when the input contains adversary-chosen trigger words. For example, a spinned1 summarization model outputs positive summaries of any text that
mentions the name of some individual or organization. https://github.com/ebagdasa/propaganda_as_a_service/blob/master/Spinning_Language_Models_for_Propaganda_As_A_Service.ipynb
Spinned models are located on HuggingFace Hub.
https://github.com/ebagdasa/propaganda_as_a_service
Other KNOWLEDGE-GRAPH-PAPERS [1]
One-Class Graph Neural Networks for Anomaly, Detection in Attributed Networks
one-class classification framework for graph anomaly
detection. OCGNN is designed to combine the powerful representation ability of Graph Neural Networks along with the classical one-class objective. Compared with other baselines, OCGNN achieves significant improvements in extensive ex-periments. (+)
Text books/ Courses
- Spectral and Algebraic Graph Theory
- Current version available at http://cs-www.cs.yale.edu/homes/spielman/sagt.
- Graph Representation Learning by William L. Hamilton
- Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg
- Network Science by Albert-László Barabási
- Stanford CS224W: Machine Learning with Graphs | 2021
- The Structure of Information Networks Computer Science 6850 Cornell University Fall 2008
- Networks Economics 2040 / Sociology 2090 / Computer Science 2850 / Information Science 2040 Cornell University, Fall 2018
- Networks (Daron Acemoglu and Asu Ozdaglar, MIT)
- Topics in Social Data (Johan Ugander, Stanford)
- Network Theory (Mark Newman, University of Michigan)
- Graphs and Networks (Dan Spielman, Yale)
- Parallel Graph Analysis (George Slota, RPI)
- Large-Scale Graph Mining (A. Erdem Sariyuce, University of Buffalo)
- [Mining Large-scale](Mining Large-scale Graph Data ) Graph Data (Danai Koutra, University of Michigan)
- Data Mining meets Graph Mining (Leman Akoglu, Stony Brook)
- Graphs and Networks (Charalampos Tsourakakis, Aalto University)
- Large-Scale Graph Processing (Keval Vora, Simon Fraser University)
- Applied Social Network Analysis in Python of University of Michigan
- The Human Network: How Your Social Position Determines Your Power, Beliefs, and Behaviors
- Social and Economic Networks Matthew O. Jackson
Labs
DIG
Data, Intelligence and Graphs
The DIG team aims at making data and knowledge easy to extract, exploit and understand. The research conducted by the team covers both theoretical and practical aspects. The outcomes of this research include new algorithms (for data mining, graph analysis, processing of data streams), new data structures and languages (for storing and querying data) and new learning techniques (for question answering, content recommendation, trend and anomaly detection).
Sciences Po Medialab
The médialab is an interdisciplinary research laboratory which conducts thematic and methodological research to investigate the role of digital technology in our societies. Sicnce po media lab also developpe severl open source backage. [2]
ISC-PIF
The ISC-PIF (Institut des Systèmes Complexes, Paris Île-de-France — Paris Île-de-France Complex Systems Institute) is an interdisciplinary research and training center that promotes the development of French, European and international strategic projects on complex adaptive systems, understood as large networks of elements interacting locally and creating macroscopic collective behavior.
Complex Networks
Complex Networks is a LIP6 team, hosted at Sorbonne Université and a member of the CNRS
Intresting podcast/video
- David Chavalarias talk on Toxic Data
- Graph Mining & Learning at the Neural Information Processing Systems Conference
The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.
- Conferance of Science po médialab (in french)
- Fast And Educational Exploration And Analysis Of Graph Data Structures With graph-tool
- Between Two Nodes: A Conversation with Peter Hinssen
Acclaimed author Peter Hinssen — also a serial entrepreneur — has penned five bestselling books, and has given many keynote speeches around the world, including for Google Think Performance, Nimbus Ninety, Tedx, Paypal, SAS, Accenture, and Apple. He also lectures at renowned business schools like the London Business School, MIT Sloan School of Management, and more.
Dr. Petar Veličković is a Staff Research Scientist at DeepMind, he has firmly established himself as one of the most significant up and coming researchers in the deep learning space. He invented Graph Attention Networks in 2017 and has been a leading light in the field ever since pioneering research in Graph Neural Networks, Geometric Deep Learning and also Neural Algorithmic reasoning. If you haven’t already, you should check out our video on the Geometric Deep learning blueprint, featuring Petar. I caught up with him last week at NeurIPS. In this show, from NeurIPS 2022 we discussed his recent work on category theory and graph neural networks.
Interesting data set
- Propaganda Tech Twitter PRC
- A Study of 10,000 Porn Stars and Their Careers
- Twitter dataset for 2022 Russian and Ukrainian crisis
- Stanford Large Network Dataset Collection
The SNAP library is being actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges.
The datasets available on the website were mostly collected (scraped) for the purposes of our research.
The website was launched in July 2009.
FrameNet
The FrameNet project is building a lexical database of English that is both human- and machine-readable, based on annotating examples of how words are used in actual texts.
Open Graph Benchmark
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader.
Recommender Systems and Personalization Datasets maintined by Julian McAuley
- Please check https://github.com/hghalebi/Blog ↩︎
- KNOWLEDGE-GRAPH-PAPERS
- Entity Extraction: From Unstructured Text to DBpedia RDF Triples LINK: http://ceur-ws.org/Vol-906/paper7.pdf
- An Automatic Knowledge Graph Creation Framework from Natural Language Text LINK: https://www.jstage.jst.go.jp/article/transinf/E101.D/1/E101.D_2017SWP0006/_pdf
- Representation Learning of Knowledge Graphs with Entity Descriptions LINK: https://dl.acm.org/citation.cfm?id=3016100.3016273
- A Review of Relational Machine Learning for Knowledge Graphs LINK: https://arxiv.org/abs/1503.00759
- Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text LINK: https://arxiv.org/abs/1807.01763
- Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing LINK: https://aclweb.org/anthology/N18-1029
- Building and Using a Knowledge Graph to Combat Human Trafficking LINK: https://usc-isi-i2.github.io/papers/szekely15-iswc.pdf ↩︎
- Médialab packages
- Table 2 Netmade by the médialab
An online tool that allows users to create a network graph in a few clicks from a csv file https://medialab.github.io/table2net/ - minet is a webmining command line tool & library for python (>= 3.7) that can be used to collect and extract data from a large variety of web sources such as raw webpages, Facebook, CrowdTangle, YouTube, Twitter, Media Cloud etc.
https://github.com/medialab/minet - Web application enabling users to publish and visually analyse networks
https://medialab.github.io/minivan/#/ - HeatGraphmade
A visualization tool allowing to produce heatmaps from the density of nodes in a spatialized network
https://github.com/medialab/heatgraph - A Python library to handle, read and write GEXF format, the XML file formats for network graphs https://github.com/paulgirard/pygexf
- sigma.jsmade
JavaScript library used to visualize network graphs in the browser
https://github.com/jacomyal/sigma.js - Gazouilloiremade
a backend tool to run massive Twitter data harvesting on the long term
https://github.com/medialab/gazouilloire - Manylinesmade
web application allowing to display, spatialize and categorize a network, then to write and share an interactive story composed of specific views of the visualization
https://github.com/medialab/manylines
Thanks for reading Deep vision! Subscribe for free to receive new posts and support my work.