Vanni Zavarella

NLP Freelance Developer

Custom NLP & AI Solutions for Turning Text Data into Actionable Insights

I'm an independent software developer specializing in Natural Language Processing, Data Science and custom Machine Learning solutions, with more than 16 years of experience in the implementation and management of projects in the domain of text data mining.

The specific scope of my expertise lays in the application of a vast variety of text mining techniques like Text Classification, Named Entity Recognition, Relation Extraction, Topic Modelling and Sentiment Analysis to large multilingual text collections and the customization of these techniques to various domains. I have contributed to the development of automatic text mining applications with 1000+ regular users, such as the Europe Media Monitor open-domain news monitoring portal (EMM, a JRC service for DG-COMM) and MediSYS (a global health surveillance portal used by European Centre for Disease Control and UN’s WHO for infectious disease outbreak early warning).

What I do

I help organizations extract, monitor, and analyze large volumes of textual data using state-of-the-art Natural Language Processing, Machine Learning, and Generative AI. I work directly with organizations to design and implement custom solutions that turn large volumes of text data into meaningful insights.

Custom Data Mining & Information Extraction with NLP and AI

I design and implement custom information extraction and data mining solutions that transform unstructured text into structured, usable data. Using modern NLP, machine learning, and generative AI techniques, I help clients:

  • Extract entities, relationships, events, and concepts from text
  • Classify and enrich documents at scale
  • Build domain-specific NLP pipelines tailored to their data and use cases
Solutions are fully customized and can be delivered as:
  • Stand-alone applications
  • Scalable APIs
  • Integrated components within existing systems

Typical use cases:

  • Automated document processing
  • Semantic search and content enrichment
  • Domain-specific text classification and tagging

Projects

Monitoring and Alerting from Large-Scale Text Streams

I build monitoring and intelligence systems that continuously analyze large streams of textual data such as news, social media, forums, or internal feeds. These systems help organizations:

  • Detect emerging trends, risks, or opportunities
  • Monitor topics, entities, and narratives over time
  • Generate alerts and dashboards based on semantic signals, not just keywords
The focus is on scalability, explainability, and actionable insights, combining NLP models with robust data pipelines and analytics dashboards.

Typical use cases:

  • Media and reputation monitoring
  • Trend and issue detection
  • Competitive or market intelligence
  • Early-warning systems from textual signals
Projects

Innovation Intelligence from Research, Patents, and Technical Data

I develop custom innovation intelligence solutions that analyze large collections of research papers, patents, project descriptions, and technical documents. By combining NLP, machine learning, and data analytics, I help organizations:

  • Map technology landscapes
  • Identify emerging research areas and trends
  • Analyze innovation trajectories and thematic evolution
  • Explore relationships between technologies, organizations, and concepts
Solutions are designed to support strategic decision-making, R&D planning, and technology scouting.

Typical use cases:

  • Technology and patent landscape analysis
  • Research trend detection
  • Innovation monitoring and foresight
  • Strategic R&D support

Projects

Custom Knowledge Graph Generation and Knowledge Engineering Solutions

I design and implement custom knowledge graph and knowledge engineering solutions that transform large, heterogeneous document collections into structured, connected, and machine-readable knowledge. Starting from vast repositories of corporate documents—particularly in technical, scientific, and medical domains—I build end-to-end pipelines that:

  • Extract entities, concepts, relations, and events using NLP and AI
  • Model domain knowledge using semantic schemas and ontologies
  • Integrate structured and unstructured data into unified knowledge graphs
  • Strategic R&D support
These solutions enable advanced search, analytics, reasoning, and AI-driven applications over complex information assets.

Projects

Projects

Research projects, software, data analytics dashboards and dataset development in which I am or was involved:

    Media Monitoring for Forest Disturbance Incidents. Ongoing project for Unit D1 of the Joint Research Center of the EU Commission
    Research trend mapping in the AECO domain (Code)(Dashboard)
    CASE20222 - Task 2:Automatically replicating manually created datasets of COVID-related protest events (Code)
    Socio-political and Crisis Events Detection at CASE@ACL-IJCNLP 2021 - Shared Task 3: Discovering Black Lives Matter events in the United States (Code)
    Security-related Event Corpus (Resources)
    EMM Open Source Intelligence Suite (EMM OSINT Suite)
    Frontex Real-time Event Extraction Framework (Overview)
    Europe Media Monitor (EMM)
    EXtraction PatteRn Engine and Specification Suite (ExPRESS)

Scientific Work

  • Vanni Zavarella & Juan Carlos Gamero-Salinas, Danilo Dessì, Sergio Consoli,Gianni Fenu, Diego Reforgiato Recupero Mapping the AECO Research Landscape using Topic Modeling, Bibliometrics and Information Extraction methods. under review by IEEE ACCESS
  • Vanni Zavarella & Lorenzo Bertolini, Sergio Consoli, Gianni Fenu, Diego Reforgiato Recupero, Alessandro Zani. Leveraging Large Language Models for Causal Relation Extraction in Biomedical Texts. under review by Information Processing and Management, Special Issue on Causal Reasoning in Language Models.
  • Zavarella, Vanni, Bertolini, Lorenzo, Consoli, Sergio, Fenu, Gianni, Recupero, Diego Reforgiato, Zani, Alessandro (2025). LLM-Powered Knowledge Graph of Causal Relations in Drug Reviews. CEUR WORKSHOP PROCEEDINGS.
  • Zavarella, Vanni, Consoli, Sergio, Recupero, Diego Reforgiato, Fenu, Gianni (2024). Exploring Digital Health Trends in the Headlines via Knowledge Graph Analysis. International Conference on Machine Learning, Optimization, and Data Science.
  • Dessi, Danilo, Fenu, Gianni, Consoli, Sergio, Zavarella, Vanni, Osborne, Francesco, Buscaldi, Davide, Angioni, Simone, Recupero, Diego Reforgiato (2024). Knowledge Graphs for Digital Transformation Monitoring in Social Media. TEXT2KG/DQMLKG@ ESWC.
  • Zavarella, Vanni, Reforgiato Recupero, D, Consoli, Sergio, Fenu, Gianni, Angioni, Simone, Buscaldi, Davide, Dessi, Danilo, Osborne, Francesco, others (2024). Knowledge Graphs for Digital Transformation Monitoring in Social Media. CEUR WORKSHOP PROCEEDINGS.
  • Zavarella, Vanni, Gamero-Salinas, Juan Carlos, Consoli, Sergio (2024). A few-shot approach for relation extraction domain adaptation using large language models. arXiv preprint arXiv:2408.02377.
  • Zavarella, Vanni, Reforgiato, Diego, Consoli, Sergio, Fenu, Gianni (2024). Charting the Landscape of Digital Health: Towards A Knowledge Graph Approach to News Media Analysis. Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization.
  • Tanev, Hristo, Zavarella, Vanni, Piskorski, Jakub, Yeniterzi, Reyyan, Yoruk, Erdem (2021). Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report. arXiv preprint arXiv:2108.07865.
  • Zavarella, Vanni, Consoli, Sergio, Recupero, Diego Reforgiato, Fenu, Gianni, Angioni, Simone, Buscaldi, Davide, Dessi, Danilo, Osborne, Francesco (2024). Tripl{\'e}toile: Extraction of knowledge from microblogging text. Heliyon.
  • Hurriyetouglu, Ali, Tanev, Hristo, Zavarella, Vanni, Yeniterzi, Reyyan, Yoruk, Erdem, Slavcheva, Milena (2023). Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text.
  • Tanev, Hristo, Stefanovitch, Nicolas, Halterman, Andrew, Uca, Onur, Zavarella, Vanni, Hurriyetouglu, Ali, De Longueville, Bertrand, Della Rocca, Leonida (2023). Detecting and geocoding battle events from social media messages on the russo-ukrainian war: Shared task 2, case 2023. Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text.
  • Zavarella, Vanni, Tanev, Hristo, Hurriyetouglu, Ali, Wiriyathammabhum, Peratham, De Longueville, Bertrand (2022). Tracking COVID-19 protest events in the United States. shared task 2: Event database replication, CASE 2022. Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE).
  • Hurriyetouglu, Ali, Tanev, Hristo, Zavarella, Vanni, Yoruk, Erdem (2022). Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE).
  • Kutuzov, A, Velldal, E, {\O}vrelid, L, Atkinson, M, Piskorski, J, Tanev, H, Zavarella, V (2017). EVENTSTORY 2017-EVENTS AND STORIES IN THE NEWS, PROCEEDINGS OF THE WORKSHOP. .
  • ATKINSON, Martin, PISKORSKI, Jakub, TANEV, Hristo, VAN, DER, ZAVARELLA, Vanni, YANGARBER, Roman, others (). Automated Event Extraction for Border Security. .
  • Piskorski, Jakub, {\v{S}}ari{\'c}, Fredi, Zavarella, Vanni, Atkinson, Martin (2021). Exploring Machine Learning Techniques for Linking Event Templates. Computational Analysis of Storylines: Making Sense of Events.
  • Hurriyetouglu, Ali, Tanev, Hristo, Zavarella, Vanni, Piskorski, Jakub, Yeniterzi, Reyyan, Mutlu, Osman, Yuret, Deniz, Villavicencio, Aline (2021). Challenges and applications of automated extraction of socio-political events from text (case 2021): Workshop and shared task report. Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021).
  • Giorgi, Salvatore, Zavarella, Vanni, Tanev, Hristo, Stefanovitch, Nicolas, Hwang, Sy, Hettiarachchi, Hansi, Ranasinghe, Tharindu, Kalyan, Vivek, Tan, Paul, Tan, Shaun, others (2021). Discovering black lives matter events in the United States: Shared task 3, CASE 2021. Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021).
  • Zavarella, Vanni, Piskorski, Jakub, Ignat, Camelia, Tanev, Hristo, Atkinson, Martin (2020). Mastering the Media Hype: Methods for Deduplication of Conflict Events from News Reports.. AI4Narratives@ IJCAI.
  • Hurriyetouglu, Ali, Yoruk, Erdem, Zavarella, Vanni, Tanev, Hristo (2020). Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020. Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020.
  • Hurriyetouglu, Ali, Zavarella, Vanni, Tanev, Hristo, Yoruk, Erdem, Safaya, Ali, Mutlu, Osman (2020). Automated extraction of socio-political events from news (AESPEN): Workshop and shared task report. arXiv preprint arXiv:2005.06070.
  • Piskorski, Jakub, Zavarella, Vanni, Atkinson, Martin, Verile, Marco (2020). Timelines: Entity-centric Event Extraction from Online News.. Text2Story@ ECIR.
  • Hurriyetouglu, Ali, Zavarella, Vanni (). Workshop on Automated Extraction of Socio-political Events from News. .
  • Piskorski, Jakub, Zavarella, Vanni, Atkinson, Martin (2018). On the development of an entity-centric timeline extraction tool. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
  • Piskorski, Jakub, {\v{S}}ari{\'c}, Fredi, Zavarella, Vanni, Atkinson, Martin (2018). On training classifiers for linking event templates. Proceedings of the Workshop Events and Stories in the News 2018.
  • Consoli, Sergio, Recupero, Diego Reforgiato, Zavarella, Vanni (2014). A survey on tidal analysis and forecasting methods for Tsunami detection. arXiv preprint arXiv:1403.0135.
  • Tanev, Hristo, Zavarella, Vanni, Steinberger, Josef (2017). Monitoring disaster impact: detecting micro-events and eyewitness reports in mainstream and social media.. ISCRAM.
  • Atkinson, Martin, Piskorski, Jakub, Tanev, Hristo, Zavarella, Vanni (2017). On the creation of a security-related event corpus. Proceedings of the Events and Stories in the News Workshop.
  • Linge, Jens P, Verile, Marco, Tanev, Hristo, Zavarella, Vanni, Fuart, Flavio, van der Goot, Erik (2012). Media monitoring of public health threats with medisys. C. WILLIAM, CWR. WEB-STER, D. BALAHUR, et al.
  • Steinberger, Ralf, Podavini, Aldo, Balahur, Alexandra, Jacquet, Guillaume, Tanev, Hristo, Linge, Jens, Atkinson, Martin, Chinosi, Michele, Zavarella, Vanni, Steiner, Yaniv, others (2016). Observing trends in automated multilingual media analysis. arXiv preprint arXiv:1603.02604.
  • Zavarella, Vanni, Kucuk, Dilek, Tanev, Hristo, Hurriyetouglu, Ali (2014). Event extraction for balkan languages. Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics.
  • Steinberger, Josef, Kabadjov, Mijail, Steinberger, Ralf, Tanev, Hristo, Turchi, Marco, Zavarella, Vanni (2012). Towards language-independent news summarization. Proceedings of the Text Analysis Conference (TAC). NIST.
  • Tanev, Hristo, Zavarella, Vanni (2014). Multilingual lexicalisation and population of event ontologies: A case study for social media. Towards the Multilingual Semantic Web: Principles, Methods and Applications.
  • Balahur, Alexandra, Turchi, Marco, Steinberger, Ralf, Ortega, Jos{\'e} Manuel Perea, Jacquet, Guillaume, Kucuk, Dilek, Zavarella, Vanni, El Ghali, Adil (2014). Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts.. LREC.
  • Zavarella, Vanni, Tanev, Hristo, Steinberger, Ralf, Van der Goot, Erik (2014). An Ontology-Based Approach to Social Media Mining for Crisis Management.. SSA-SMILE@ ESWC.
  • Steinberger, Josef, Steinberger, Ralf, Tanev, Hristo, Zavarella, Vanni, Turchi, Marco (2014). Aspects of Multilingual News Summarisation. Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding.
  • Zavarella, Vanni, Tanev, Hristo (2013). Fss-timex for tempeval-3: Extracting temporal information from text. Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013).
  • Atkinson, Martin, Du, Mian, Piskorski, Jakub, Tanev, Hristo, Yangarber, Roman, Zavarella, Vanni (2013). Techniques for multilingual security-related event extraction from online news. Computational Linguistics: Applications.
  • Zavarella, Vanni, Tanev, Hristo, Linge, Jens, Piskorski, Jakub, Atkinson, Martin, Steinberger, Ralf (2010). Exploiting multilingual grammars and machine learning techniques to build an event extraction system for Portuguese. International Conference on Computational Processing of the Portuguese Language.
  • Zavarella, Vanni, Piskorski, Jakub, Esteves, Ana Sofia, Bucci, Stefano (2012). Refining Border Security News Event Geotagging through Deployment of Lexico-Semantic Patterns. 2012 European Intelligence and Security Informatics Conference.
  • Atkinson, Martin, Piskorski, Jakub, Tanev, Hristo, van der Goot, Eric, Yangarber, Roman, Zavarella, Vanni (2009). Automated event extraction in the domain of border security. International Conference on User Centric Media.
  • Atkinson, Martin, Belayeva, Jenya, Zavarella, Vanni, Piskorski, Jakub, Huttunen, Silja, Vihavainen, Arto, Yangarber, Roman (2010). News mining for border security intelligence. 2010 IEEE international conference on intelligence and security informatics.
  • Tanev, Hristo, Pouliquen, Bruno, Zavarella, Vanni, Steinberger, Ralf (2010). Automatic expansion of a social network using sentiment analysis. Data Mining for Social Network Data.
  • Turchi, Marco, Zavarella, Vanni, Tanev, Hristo (2011). Pattern learning for event extraction using monolingual statistical machine translation. Proceedings of the International Conference Recent Advances in Natural Language Processing 2011.
  • Tanev, Hristo, Ehrmann, Maud, Piskorski, Jakub, Zavarella, Vanni (2012). Enhancing event descriptions through twitter mining. Proceedings of the International AAAI Conference on Web and Social Media.
  • Atkinson, Martin, Piskorski, Jakub, Pouliquen, Bruno, Steinberger, Ralf, Tanev, Hristo, Zavarella, Vanni (2008). Online-monitoring of security-related events. Coling 2008: Companion volume: Demonstrations.
  • Piskorski, Jakub, Atkinson, Martin, Belyaeva, Jenya, Zavarella, Vanni, Huttunen, Silja, Yangarber, Roman (2010). Real-time text mining in multilingual news for the creation of a pre-frontier intelligence picture. ACM SIGKDD Workshop on Intelligence and Security Informatics.
  • Steinberger, Josef, Kabadjov, Mijail A, Steinberger, Ralf, Tanev, Hristo, Turchi, Marco, Zavarella, Vanni (2011). JRC's Participation at TAC 2011: Guided and MultiLingual Summarization Tasks.. TAC.
  • Piskorski, Jakub, Tanev, Hristo, Atkinson, Martin, Van Der Goot, Eric, Zavarella, Vanni (2011). Online news event extraction for global crisis surveillance. Transactions on computational collective intelligence V.
  • Zavarella, Vanni, Tanev, Hristo, Piskorski, Jakub (2009). Event extraction for Italian using a cascade of finite-state grammars. Finite-State Methods and Natural Language Processing.
  • Tanev, Hristo, Zavarella, Vanni, Linge, Jens, Kabadjov, Mijail, Piskorski, Jakub, Atkinson, Martin, Steinberger, Ralf (2009). Exploiting machine learning techniques to build an event extraction system for portuguese and spanish. Linguam{\'a}tica.
  • Steinberger, Josef, Ebrahim, Mohamed, Ehrmann, Maud, Hurriyetoglu, Ali, Kabadjov, Mijail, Lenkova, Polina, Steinberger, Ralf, Tanev, Hristo, V{\'a}zquez, Silvia, Zavarella, Vanni (2012). Creating sentiment dictionaries via triangulation. Decision support systems.
  • Balahur, Alexandra, Steinberger, Ralf, Kabadjov, Mijail, Zavarella, Vanni, Van Der Goot, Erik, Halkia, Matina, Pouliquen, Bruno, Belyaeva, Jenya (2013). Sentiment analysis in the news. arXiv preprint arXiv:1309.6202.

Get in touch

Interested in discussing a project or exploring how AI can support your data challenges? Get in touch to describe your use case and data, and I’ll get back to you promptly.

Aviso Legal

En cumplimiento con lo dispuesto en la Ley 34/2002, de 11 de julio, de servicios de la sociedad de la información y de comercio electrónico (LSSI-CE), se informa a los usuarios de los siguientes datos:

Datos del titular

  • Titular: Vanni Zavarella
  • NIF / VAT: Z0065148V
  • Domicilio profesional: Avenida Eulza, 31010 Barañain, Spain
  • Correo electrónico: vanni.zavarella@zavasemanticslab.org

Objeto

El presente sitio web tiene como finalidad ofrecer información sobre los servicios profesionales de desarrollo y consultoría en Procesamiento del Lenguaje Natural (NLP), Ciencia de Datos e Inteligencia Artificial.

Condiciones de uso

El acceso y uso de este sitio web atribuye la condición de usuario, quien acepta, desde dicho acceso y/o uso, las presentes condiciones de uso. El usuario se compromete a hacer un uso adecuado de los contenidos y servicios.

Propiedad intelectual e industrial

Todos los contenidos del sitio web, incluidos textos, diseños, código fuente, logotipos y gráficos, son titularidad del propietario del sitio o cuentan con licencia para su uso, y están protegidos por la normativa vigente en materia de propiedad intelectual e industrial.

Responsabilidad

El titular no se hace responsable de los daños y perjuicios derivados del uso de la información contenida en este sitio web ni de posibles errores u omisiones en los contenidos.

Legislación aplicable

La relación entre el titular y el usuario se regirá por la normativa vigente en España, y cualquier controversia se someterá a los juzgados y tribunales correspondientes.