Introduction to Event Registry Python SDK

This page will give you an introduction on how to use the Python SDK that will allow you to search for articles and events provided by NewsApi.ai. It describes how to install it, import it and how to use the most relevant classes in order to search for news articles or events.

Please also note that you can interactively try and modify this Jupyter notebook here.

Registering for a free account

In order to use the SDK you'll need your own API key. To get it, please register for a free account. The free account will give you access to 2.000 tokens for free. After you use them, you'll need to subscribe to a paid plan to continue using the service. You'll be able to monitor your token usage on your dashboard.

Importing Event Registry module

In order to use Event Registry, you have to import the module called eventregistry. To install the module open the command line prompt and call

>>> pip install eventregistry

To start using the module, you then first need to import it:

In [2]:
from eventregistry import *
import json, os, sys

There is one main class that interacts with Event Registry service and it is called EventRegistry. The class accepts an input parameter apiKey, which you will need to provide in order to make more than a trivial number of requests.

In [5]:
er = EventRegistry(apiKey = "YOUR_API_KEY", allowUseOfArchive=False)
using user provided API key for making requests
Event Registry host: http://eventregistry.org
Text analytics host: http://analytics.eventregistry.org

The additional parameter that I've provided in the constructor is the allowUseOfArchive, which I use to state that I only want to perform the searches on the data published less than 1 month ago. If you need to search older content too, please remove that parameter (note that free users don't have access to archive, regardless of this parameter value).

A few example queries

To show some examples, we are providing here a small list of example queries. All the details about the queries and different properties used will be described in the individual sections below.

Ex 1: Getting the most relevant articles about Donald Trump or Boris Johnson written by New York Times on the topic of Business:

In [48]:
q = QueryArticlesIter(
    keywords = QueryItems.OR(["Donald Trump", "Boris Johnson"]),
    sourceUri= er.getSourceUri("New York Times"),
    categoryUri = er.getCategoryUri("Business"))

print("Number of results: %d" % q.count(er))

for art in q.execQuery(er, sortBy = "rel", maxItems = 2):
    print(json.dumps(art, indent=4))
Number of results: 66
{
    "uri": "1250531359",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-09",
    "time": "10:44:00",
    "dateTime": "2019-09-09T10:44:00Z",
    "dataType": "news",
    "sim": 0.5921568870544434,
    "url": "https://www.nytimes.com/2019/09/09/business/dealbook/mit-media-lab-jeffrey-epstein.html",
    "title": "DealBook Briefing: How an M.I.T. Lab Hid Links to Jeffrey Epstein",
    "body": "Good Monday morning. Programming note: I'm going to be in conversation with Blackstone's Stephen Schwarzman ... (shortened for brevity)",
    "source": {
        "uri": "nytimes.com",
        "dataType": "news",
        "title": "The New York Times"
    },
    "authors": [],
    "image": "https://static01.nyt.com/images/2019/09/09/business/09db-newsletter-ito/merlin_160372998_d07067fd-970a-46f5-9f12-3eb124df1fb1-videoSixteenByNineJumbo1600.jpg",
    "eventUri": "spa-1640439",
    "sentiment": 0.01960784313725483,
    "wgt": 305721840
}
{
    "uri": "1250484368",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-09",
    "time": "10:05:00",
    "dateTime": "2019-09-09T10:05:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://www.nytimes.com/2019/09/09/podcasts/the-daily/parliament-strikes-back-in-britain.html",
    "title": "Parliament Strikes Back in Britain",
    "body": "Listen and subscribe to our podcast from your mobile device:\n\nVia Apple Podcasts | Via RadioPublic | Via Stitcher\n\nIn a battle ... (shortened for brevity)",
    "source": {
        "uri": "nytimes.com",
        "dataType": "news",
        "title": "The New York Times"
    },
    "authors": [],
    "image": "https://static01.nyt.com/images/2019/09/09/podcasts/09daily2/09daily2-videoSixteenByNineJumbo1600.jpg",
    "eventUri": null,
    "sentiment": 0.03529411764705892,
    "wgt": 305719500
}

Ex 2: The list of latest articles in Chinese or Arabic articles about Apple:

In [39]:
q = QueryArticlesIter(
    conceptUri = er.getConceptUri("Apple"),
    lang = QueryItems.OR(["ara", "zho"]))

for art in q.execQuery(er, sortBy = "date", maxItems = 2):
    print(json.dumps(art, indent=4))
{
    "uri": "1244491882",
    "lang": "ara",
    "isDuplicate": false,
    "date": "2019-09-08",
    "time": "18:12:00",
    "dateTime": "2019-09-08T18:12:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://www.youm7.com/story/2019/9/8/\u0645\u0633\u062a\u062e\u062f\u0645\u0648-\u0647\u0630\u0647-\u0627\u0644\u0647\u0648\u0627\u062a\u0641-\u0623\u0643\u062b\u0631-\u0627\u0644\u0645\u062a\u0636\u0631\u0631\u064a\u0646-\u0645\u0646-\u0625\u0637\u0644\u0627\u0642-\u0623\u064a\u0641\u0648\u0646-11-\u0627\u0644\u062c\u062f\u064a\u062f/4408241",
    "title": "\u0645\u0633\u062a\u062e\u062f\u0645\u0648 \u0647\u0630\u0647 \u0627\u0644\u0647\u0648\u0627\u062a\u0641 \u0623\u0643\u062b\u0631 \u0627\u0644\u0645\u062a\u0636\u0631\u0631\u064a\u0646 \u0645\u0646 \u0625\u0637\u0644\u0627\u0642 \u0623\u064a\u0641\u0648\u0646 11 \u0627\u0644\u062c\u062f\u064a\u062f.. \u0627\u0639\u0631\u0641\u0647\u0645 - \u0627\u0644\u064a\u0648\u0645 \u0627\u0644\u0633\u0627\u0628\u0639",
    "body": "\u0627\u0636\u0641 \u062a\u0639\u0644\u064a\u0642\u0627\u064b \u0648\u0627\u0642\u0631\u0623 \u062a\u0639\u0644\u064a\u0642\u0627\u062a \u0627\u0644\u0642\u0631\u0627\u0621\n\n\u064a\u0639\u062f ... (shortened for brevity)",
    "source": {
        "uri": "youm7.com",
        "dataType": "news",
        "title": "Youm7"
    },
    "authors": [],
    "image": "https://img.youm7.com/large/201906021242484248.jpg",
    "eventUri": null,
    "sentiment": null,
    "wgt": 305662320
}
{
    "uri": "1244490961",
    "lang": "ara",
    "isDuplicate": false,
    "date": "2019-09-08",
    "time": "18:11:00",
    "dateTime": "2019-09-08T18:11:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://www.dostor.org/2817577",
    "title": "\u0645\u0648\u0627\u0635\u0641\u0627\u062a \u0627\u0644\u062c\u064a\u0644 \u0627\u0644\u062b\u0627\u0644\u062b \u0644\u0633\u0645\u0627\u0639\u0627\u062a \u0647\u0648\u0627\u0648\u064a",
    "body": "\u0627\u0644\u0633\u0645\u0627\u0639\u0629 \u0627\u0644\u062c\u062f\u064a\u062f\u0629 \u062a\u0634\u0628\u0647 \u0644\u062d\u062f \u0643\u0628\u064a\u0631 ... (shortened for brevity)",
    "source": {
        "uri": "dostor.org",
        "dataType": "news",
        "title": "Dostor"
    },
    "authors": [],
    "image": "https://www.dostor.org/upload/photo/news/281/7/500x282o/577.jpg?q=2",
    "eventUri": null,
    "sentiment": null,
    "wgt": 305662260
}

Ex 3: Largest recent events on the topic of Brexit:

In [43]:
q = QueryEventsIter(keywords = "Brexit")

for event in q.execQuery(er, sortBy = "size", maxItems = 1):
    print(json.dumps(event, indent=4))

{
    "uri": "eng-5028293",
    "concepts": [
        {
            "uri": "http://en.wikipedia.org/wiki/Boris_Johnson",
            "type": "person",
            "score": 100,
            "label": {
                "eng": "Boris Johnson"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Brexit",
            "type": "wiki",
            "score": 86,
            "label": {
                "eng": "Brexit"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Parliament",
            "type": "wiki",
            "score": 79,
            "label": {
                "eng": "Parliament"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/United_Kingdom",
            "type": "loc",
            "score": 71,
            "label": {
                "eng": "United Kingdom"
            },
            "location": {
                "type": "country",
                "label": {
                    "eng": "United Kingdom"
                }
            }
        },
        ... (shortened for brevity)
    ],
    "eventDate": "2019-08-27",
    "totalArticleCount": 2800,
    "title": {
        "eng": "Government asks Queen to suspend Parliament",
        "fra": "Brexit: Hugh Grant \u00e9trille Boris Johnson et lui demande d'aller \" se faire foutre \"",
        "zho": "\u82f1\u56fd\u8131\u6b27\uff1a\u8bae\u4f1a\u88ab\u505c\u6446\u5982\u4f55\u963b\u6b62\u65e0\u534f\u8bae\u786c\u8131",
        "por": "Brexit a f\u00f3rceps - 30/08/2019 - Opini\u00e3o - Folha",
        "spa": "Boris Johnson planea pedir a la Reina que suspenda el Parlamento"
    },
    "summary": {
        "eng": "The government has asked the Queen to suspend Parliament just days after MPs return to work in September - and only a ... (shortened for brevity)",
        "fra": "L'acteur britannique a vivement r\u00e9agi sur Twitter \u00e0 la d\u00e9cision du Premier ministre de suspendre le ... (shortened for brevity)",
        "zho": "\u8fd9\u662f\u5916\u90e8\u94fe\u63a5\uff0c\u6d4f\u89c8\u5668\u5c06\u6253\u5f00\u53e6\u4e00\u4e2a\u7a97\u53e3\n\n\u79bb10 ... (shortened for brevity)",
        "por": "Faltando apenas dois meses para o fim do prazo acordado para o brexit, a sa\u00edda do Reino Unido da Uni\u00e3o Europeia (UE), ... (shortened for brevity)",
        "spa": "Reino Unido. Acuerdo de la oposici\u00f3n para frenar el Brexit sin acuerdo por la v\u00eda legislativa\n\nLa Reina Isabel II ... (shortened for brevity)"
    },
    "location": {
        "type": "place",
        "label": {
            "eng": "London"
        },
        "country": {
            "type": "country",
            "label": {
                "eng": "United Kingdom"
            }
        }
    },
    "categories": [
        {
            "uri": "dmoz/Society/Government",
            "label": "dmoz/Society/Government",
            "wgt": 11
        },
        {
            "uri": "dmoz/Society/Politics",
            "label": "dmoz/Society/Politics",
            "wgt": 12
        },
        {
            "uri": "dmoz/Society/Government/Parliaments_and_Legislatures",
            "label": "dmoz/Society/Government/Parliaments and Legislatures",
            "wgt": 24
        },
        {
            "uri": "news/Politics",
            "label": "news/Politics",
            "wgt": 91
        }
    ],
    "articleCounts": {
        "eng": 827,
        "fra": 341,
        "zho": 325,
        "por": 204,
        "spa": 257
    },
    "sentiment": -0.05882352941176472,
    "wgt": 2800
}

Auto-suggestion methods

Several API calls accept parameters that are unique identifiers - examples of such parameters are concepts, categories and sources. If you just know a pretty name or a label of such parameter, then you can use the auto-suggest methods to obtain the unique identifier for the parameter.

If you know that there is a category for Investing, then you can get the URI for it like this:

In [15]:
er.getCategoryUri("investing")
Out[15]:
'dmoz/Business/Investing'

Similarly, if you want to filter based on the sources, then you can get the source URI by providing the source name or the domain name:

In [7]:
print(er.getSourceUri("new york times"))
print(er.getSourceUri("nytimes"))
nytimes.com
nytimes.com

For concepts, the URIs are URLs of the corresponding Wikipedia pages.

In [20]:
er.getConceptUri("Obama")
Out[20]:
'http://en.wikipedia.org/wiki/Barack_Obama'

The autosuggestion works even for the company tickers:

In [46]:
er.getConceptUri("AAPL")
Out[46]:
'http://en.wikipedia.org/wiki/Apple_Inc.'

Searching for articles

There are two classes that can be used for searching for articles - QueryArticlesIter and QueryArticles. Use QueryArticlesIter when you simply want to download articles matching a query. QueryArticles can instead be used when you need to download various summaries of the results, like top concepts, top sources, top authors, etc.

Both classes allow you to specify in the constructor several filters, such as:

  • keywords - find articles that mention the keywords or phrases
  • conceptUri - find articles that mention the concept(s)
  • categoryUri - find articles that are about one or more categories
  • lang - find articles written in the given language
  • dateStart - find articles that were written on the given date or later (in the YYYY-MM-DD format)
  • dateEnd - find articles that were written before or on the given date (in the YYYY-MM-DD format)
  • sourceUri - find articles written by the given publisher(s)
  • sourceLocationUri - find articles written by publishers located in the given location (city or country)
  • authorUri - find articles written by the given author(s)
  • locationUri - find articles that mention the given location in the article dateline
  • keywordsLoc - if keywords are provided, where should we search for the keyword (title or body (default))
  • minSentiment, maxSentiment - min and max value of the sentiment (from -1 to 1)
  • startSourceRankPercentile - starting percentile rank of the sources to consider in the results (default: 0). Value should be in range 0-90 and divisible by 10.
  • endSourceRankPercentile - ending percentile rank of the sources to consider in the results (default: 100). Value should be in range 10-100 and divisible by 10.
  • ignoreKeywords, ignoreConceptUri, ignoreCategoryUri, ... - from the articles that match the rest of the conditions, exclude the articles that match any of the provided filters
  • dataType - which data types should be included in the results - news (default), blog or pr

When multiple filters are specified, the results have to match all of the provided filters. For example, when keywords and sources are specified, the results will be articles written by these sources that mention the provided keywords.

If you'll want to make a search, where any of the specified filtes are true, you'll have to use the Advanced Query Language

When you create an instance of a QueryArticlesIter class, you can then retrieve the resulting articles by calling the execQuery method. The execQuery method will iterate over the resulting articles, so you can simply use it in a for loop. In the method call you also need to provide the instance of your EventRegistry class since it will be used to iteratively download the matching articles in batches of 100 items per call.

q = QueryArticlesIter(keywords="Tesla")
for art in q.execQuery(er, sortBy = "date", maxItems = 300):
    print(art)

Please note two important parameters:

  • sortBy parameter determines how the articles should be sorted before they are retrieved. Beside the date, you can also sort by relevance, source importance, shares on social media and others.
  • maxItems parameter determines how many of the matching articles to retrieve before the for loop finishes. It is very important that you set this parameter if you don't want to download all matching results.

Please check the documentation page to see the full list of parameters and their descriptions related to the execQuery method.

Using QueryItems.AND() and QueryItems.OR() when providing a list of filters of the same type

When you want to provide several keywords, concepts, categories, etc., you have to explicitly determine whether you'd like that the results mention all of them, or any of them.

To do that, you can use the QueryItems.AND() and QueryItems.OR() methods

In [47]:
q = QueryArticlesIter(keywords = QueryItems.OR(["Samsung", "Apple", "Google"]))
print("Count with any of the companies: %d" % q.count(er))

q = QueryArticlesIter(keywords = "Samsung")
print("Count mentioning Samsung: %d" % q.count(er))
Count with any of the companies: 216551
Count mentioning Samsung: 38654

Retrieving different properties about articles

When retrieving articles, you can retrieve a lot of properties. Some properties are not returned by default, such as list of mentioned concepts, categories, links, videos, etc.

To modify which properties to return, use specify the returnInfo parameter of type ReturnInfo. With ReturnInfo you can specify which parameters will be returned for all available returned objects, like articles, concepts, categories, events, ...

QueryArticlesIter(..., returnInfo = ReturnInfo(...))

The detailed description of ReturnInfo and available parameters are described here.

In [6]:
ReturnInfo(
    articleInfo = ArticleInfoFlags(),             # details about the articles to return
    eventInfo = EventInfoFlags(),                 # details about the events to return
    sourceInfo = SourceInfoFlags(),               # details about the news sources to return
    categoryInfo = CategoryInfoFlags(),           # details about the categories to return
    conceptInfo = ConceptInfoFlags(),             # details about the concepts to return
    locationInfo = LocationInfoFlags(),           # details about the locations to return
    storyInfo = StoryInfoFlags(),                 # details about the stories to return
    conceptClassInfo = ConceptClassInfoFlags(),   # details about the concept classes to return
    conceptFolderInfo = ConceptFolderInfoFlags()) # details about the concept folders to return

An example query that will return list of concepts, categories, source location, and a list of potential duplicates of the article:

In [9]:
q = QueryArticlesIter(keywords = "Trump", sourceUri = "nytimes.com")
for art in q.execQuery(er,
        sortBy = "date",
        maxItems = 1,
        returnInfo = ReturnInfo(
            articleInfo=ArticleInfoFlags(concepts=True, categories=True, duplicateList=True, location=True),
            sourceInfo=SourceInfoFlags(location=True, image=True)
        )):
    print(json.dumps(art, indent=4))
{
    "uri": "1253032895",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-11",
    "time": "04:07:00",
    "dateTime": "2019-09-11T04:07:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://www.nytimes.com/2019/09/10/pageoneplus/corrections-september-11-2019.html",
    "title": "Corrections: September 11, 2019.",
    "body": "INTERNATIONAL\n\nAn article on Sunday about an Iranian oil tanker anchored off the coast of Syria incorrectly described the release of satellite images of the Adrian Darya 1. The images were released by the space technology company Maxar Technologies, not by two companies.\n\n*\n\nAn article on Tuesday ...",
    "source": {
        "uri": "nytimes.com",
        "dataType": "news",
        "title": "The New York Times",
        "location": {
            "type": "place",
            "label": {
                "eng": "New York City"
            },
            "country": {
                "type": "country",
                "label": {
                    "eng": "United States"
                }
            }
        },
        "locationValidated": true,
        "image": "http://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_bigger.jpg",
        "thumbImage": "http://pbs.twimg.com/profile_images/758384037589348352/KB3RFwFm_mini.jpg",
        "favicon": "https://static01.nyt.com/favicon.ico"
    },
    "authors": [],
    "concepts": [
        {
            "uri": "http://en.wikipedia.org/wiki/Satellite_imagery",
            "type": "wiki",
            "score": 4,
            "label": {
                "eng": "Satellite imagery"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Oil_tanker",
            "type": "wiki",
            "score": 4,
            "label": {
                "eng": "Oil tanker"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Syria",
            "type": "loc",
            "score": 4,
            "label": {
                "eng": "Syria"
            },
            "location": {
                "type": "country",
                "label": {
                    "eng": "Syria"
                }
            }
        },
        ... (shortened for brevity)
    ],
    "categories": [
        {
            "uri": "dmoz/Society/Issues",
            "label": "dmoz/Society/Issues",
            "wgt": 17
        },
        {
            "uri": "dmoz/Society/Issues/Business",
            "label": "dmoz/Society/Issues/Business",
            "wgt": 15
        },
        {
            "uri": "dmoz/Society/Issues/Intellectual_Property",
            "label": "dmoz/Society/Issues/Intellectual Property",
            "wgt": 14
        },
        {
            "uri": "news/Politics",
            "label": "news/Politics",
            "wgt": 80
        }
    ],
    "image": "https://static01.nyt.com/newsgraphics/images/icons/defaultCrop.png",
    "duplicateList": [],
    "eventUri": null,
    "location": null,
    "sentiment": 0.1529411764705881,
    "wgt": 305870820
}

Creating complex queries

In some cases you might want to create a query, that cannot be created using the simple QueryArticlesIter constructor. An example of such query would be:

Give me articles that are on the topic of business or mention Tesla Inc.

Keep in mind that creating a query

QueryArticlesIter(
    conceptUri = er.getConceptUri("Tesla"),
    categoryUri = er.getCategoryUri("business"))

would return articles on the topic of business and mention Tesla Inc.

So how can we create a correct query? In such cases you have to look into the Advanced Query Language.

Using this language, the correct above query would look like this:

In [3]:
qStr = """
{
    "$query": {
        "$or": [
            { "conceptUri": "%s" },
            { "categoryUri" : "%s"}
        ]
    }
}
""" % (er.getConceptUri("Tesla"), er.getCategoryUri("business"))

print(qStr)

q = QueryArticlesIter.initWithComplexQuery(qStr)
for art in q.execQuery(er, maxItems = 1):
    print(art)
{
    "$query": {
        "$or": [
            { "conceptUri": "http://en.wikipedia.org/wiki/Tesla,_Inc." },
            { "categoryUri" : "dmoz/Business"}
        ]
    }
}

{'uri': '1231479834', 'lang': 'eng', 'isDuplicate': False, 'date': '2019-08-28', 'time': '21:02:10', 'dateTime': '2019-08-28T21:02:10Z', 'dataType': 'news', 'sim': 0.9058823585510254, 'url': 'https://www.digitaltrends.com/news/tesla-launches-car-insurance-cheaper-rates/', 'title': 'Tesla Launches Car Insurance For Its Customers, Promising 30% Cheaper Rates | Digital Trends', 'body': 'Elon Musk\'s Tesla is getting into the insurance business: the company just launched Tesla Insurance to customers in California.\n\nThe company said on Wednesday that they would offer rates that are between 20% and 30% lower than other insurance companies. Tesla said that their insurance would offer Tesla drivers "comprehensive coverage and claims management," adding that it will expand coverage to more states in the future.\n\n"Because Tesla knows its vehicles best, Tesla Insurance is able to leverage the advanced technology, safety, and serviceability of our cars to provide insurance at a lower cost. This pricing reflects the benefits of Tesla\'s active safety and advanced driver assistance features that come standard on all new Tesla vehicles," the company wrote in a blog post announcing the new venture.\n\nThe company also said that it would expand its insurance plan in the future to offer as a commercial policy for those who use their cars for ridesharing apps like Uber and Lyft.\n\nCalifornia residents who own or have already ordered a Model S, Model X, Model 3 or Roadster, can go to the Tesla Insurance website to get a free quote. Tesla said that pricing would vary based on the individual, taking into account their driving records\n\nTesla CEO Elon Musk first hinted at Tesla Insurance back in April, according to CNBC. Musk, who is active on Twitter, hasn\'t mentioned anything on the platform about the new insurance offering since it debuted.\n\nIn 2016, Tesla began to offer its InsureMyTesla program to Tesla vehicle owners in Australia and Hong Kong by partnering with larger insurance companies. Those prices were estimated to start at $900 a year for an insurance plan.\n\nTesla\'s cars offer a lot to their owners, but their advanced systems typically require specialized repair, driving up the price of insurance for most Tesla drivers. According to ValuePenguin, the annual cost of insuring some Tesla models tops out at around $3,000 per year.\n\nIt\'s not clear if Tesla might eventually offer their insurance program to owners of other kinds of cars. Digital Trends reached out to Tesla for more information on Tesla Insurance, but we have yet to receive a response.\n\nEditors\' Recommendations Walmart becomes the fourth plaintiff to sue Tesla in less than a month Study finds motorists overestimate Autopilot\'s capabilities; Tesla disagrees Teslas will soon be able to stream Netflix and Hulu, Elon Musk says Tesla owners will soon be able to play Cuphead while parked Tesla hasn\'t made a pickup, so YouTuber Simone Giertz made one out of a Model 3', 'source': {'uri': 'digitaltrends.com', 'dataType': 'news', 'title': 'Digital Trends'}, 'authors': [], 'image': 'https://icdn4.digitaltrends.com/image/digitaltrends/tesla-3-rear-510x0.jpg', 'eventUri': 'eng-5030245', 'sentiment': 0.3490196078431373, 'wgt': 125}

A more complex example could look something like this:

In [8]:
qStr = """{
    "$query": {
        "$or": [
            { "conceptUri": "http://en.wikipedia.org/wiki/Artificial_Intelligence" },
            {
                "keyword": {
                    "$and": [ "deep learning", "machine learning" ]
                }
            }
        ],
        "$not": {
            "keyword": "data mining",
            "keywordLoc": "title"
        }
    },
    "$filter": {
        "dataType": ["news", "blog"],
        "isDuplicate": "skipDuplicates",
        "startSourceRankPercentile": 0,
        "endSourceRankPercentile": 30,
        "minSentiment": 0.2
    }
}"""

q = QueryArticlesIter.initWithComplexQuery(qStr)
for art in q.execQuery(er, maxItems = 1, sortBy = "date", returnInfo = ReturnInfo(articleInfo = ArticleInfoFlags(bodyLen=300))):
    print(json.dumps(art, indent=4))
{
    "uri": "1252338235",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-10",
    "time": "15:26:00",
    "dateTime": "2019-09-10T15:26:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://finance.yahoo.com/news/hpe-accelerates-artificial-intelligence-innovation-114500063.html",
    "title": "HPE Accelerates Artificial Intelligence Innovation with Enterprise-Grade Solution for Managing Entire Machine Learning Lifecycle",
    "body": "SAN JOSE, Calif.--(BUSINESS WIRE)--\n\nNew HPE Machine Learning (ML) Ops solution speeds time-to-value for AI from months to days and brings DevOps ... (shortened for brevity)",
    "source": {
        "uri": "finance.yahoo.com",
        "dataType": "news",
        "title": "Yahoo Finance"
    },
    "authors": [],
    "image": null,
    "eventUri": null,
    "sentiment": 0.2705882352941176,
    "wgt": 305825160
}

Retrieving summaries of search results

The QueryArticlesIter class is great for obtaining the list of articles that match a certain criteria. In some cases, you want, however, to obtain a summary of search results. Examples of such summaries that can be obtained are the list of top mentioned concepts, top keywords, timeline of the results, and top news sources.

We call such summaries aggregates and in order to obtain them, you have to use the QueryArticles class. QueryArticles class accepts the same arguments in the constructor, except that it also accepts an argument requestedResult. The requestedResult argument can be an instance of any of these classes:

  • RequestArticlesInfo - use to retrieve a list of articles
  • RequestArticlesUriWgtList - returns a long list of article uris
  • RequestArticlesTimeAggr - returns the time distribution of search results
  • RequestArticlesConceptAggr - returns the top concepts mentioned in the search results
  • RequestArticlesKeywordAggr - returns the top keywords matching the search results
  • RequestArticlesCategoryAggr - returns the top categories matching the search results
  • RequestArticlesSourceAggr - returns the top news sources that authored the search results
  • RequestArticlesConceptGraph - returns which top mentioned concepts frequently co-occur with other concepts
  • RequestArticlesDateMentionAggr - returns which dates are frequently mentioned in the search results

In addition, to execute the search using the QueryArticles class, you call the execQuery method on the EventRegistry class:

er.execQuery(q)

An example looks like this:

In [11]:
q = QueryArticles(
    conceptUri = er.getConceptUri("tesla"),
    requestedResult = RequestArticlesTimeAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
{
    "timeAggr": {
        "usedResults": 13298,
        "totalResults": 13298,
        "results": [
            {
                "date": "2019-08-10",
                "count": 153
            },
            {
                "date": "2019-08-11",
                "count": 234
            },
            {
                "date": "2019-08-12",
                "count": 363
            },
            {
                "date": "2019-08-13",
                "count": 330
            },
            {
                "date": "2019-08-14",
                "count": 219
            },
            ... (shortened for brevity)
        ]
    }
}
In [12]:
q = QueryArticles(
    conceptUri = er.getConceptUri("tesla"),
    requestedResult = RequestArticlesConceptAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
{
    "conceptAggr": {
        "warning": "Due to large number of results, the information was computed only on a sample of 10,000 articles",
        "usedResults": 10000,
        "totalResults": 13296,
        "results": [
            {
                "uri": "http://en.wikipedia.org/wiki/Elon_Musk",
                "type": "person",
                "label": {
                    "eng": "Elon Musk"
                },
                "score": 100
            },
            {
                "uri": "http://en.wikipedia.org/wiki/United_States",
                "type": "loc",
                "label": {
                    "eng": "United States"
                },
                "location": {
                    "type": "country",
                    "label": {
                        "eng": "United States"
                    }
                },
                "score": 72.99863387978142
            },
            {
                "uri": "http://en.wikipedia.org/wiki/Electric_car",
                "type": "wiki",
                "label": {
                    "eng": "Electric car"
                },
                "score": 67.48633879781421
            },
            ... (shortened for brevity)
        ]
    }
}

Searching for events

Events are collections of articles for which we automatically identify that they discuss the same thing that happened. Examples of events include the launch of the new iPhone on Sept 10, 2019, Trump firing John Bolton as national security adviser, Nissan's CEO resigning, etc.

Searching for events is very similar to searching for articles. There are two main classes available to do the search - QueryEventsIter and QueryEvents.

You should use QueryEventsIter in order to retrieve the list of events that match a certain set of conditions.

QueryEvents class should be used to obtain various kinds of summaries about the events that match the search conditions.

Both classes allow you to specify in the constructor several filters, such:

  • keywords - find events that mention the keywords or phrases
  • conceptUri - find events that mention the concept(s)
  • categoryUri - find events that are about category(s)
  • sourceUri - find events covered by the given publisher(s)
  • sourceLocationUri - find events covered by publishers located in the given location
  • authorUri - find events written by the given author(s)
  • locationUri - find events that mention the given location in the dateline
  • lang - find events reported in the given language(s)
  • dateStart - find events that occurred on the given date or later (in the YYYY-MM-DD format)
  • dateEnd - find events that occurred before or on the given date (in the YYYY-MM-DD format)
  • keywordsLoc - if keywords are provided, where should we search for the keyword (title or body (default))
  • minSentiment, maxSentiment - min and max value of the sentiment (from -1 to 1)
  • minArticlesInEvents, maxArticlesInEvent - limit events to only those that have been covered by a certain number of articles
  • startSourceRankPercentile - starting percentile of the sources that should cover the event (default: 0). Value should be in range 0-90 and divisible by 10.
  • endSourceRankPercentile - ending percentile of the sources that should cover the event (default: 100). Value should be in range 10-100 and divisible by 10.
  • ignoreKeywords, ignoreConceptUri, ignoreCategoryUri, ... - from the events that match the rest of the conditions, exclude those that match any of the provided filters

When multiple filters are specified, the results have to match all of the provided filters. For example, when keywords and sources are specified, the results will be events covered by these sources that mention the provided keywords.

If you'll want to make a search, where any of the specified filtes are true, you'll have to use the Advanced Query Language.

An example query should look like this:

In [14]:
q = QueryEventsIter(conceptUri = er.getConceptUri("Apple"))
for event in q.execQuery(er, sortBy = "size", maxItems = 1):
    print(json.dumps(event, indent = 4))
{
    "uri": "eng-5059598",
    "concepts": [
        {
            "uri": "http://en.wikipedia.org/wiki/IPhone",
            "type": "wiki",
            "score": 100,
            "label": {
                "eng": "IPhone"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Apple_Inc.",
            "type": "org",
            "score": 92,
            "label": {
                "eng": "Apple Inc."
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Smartphone",
            "type": "wiki",
            "score": 32,
            "label": {
                "eng": "Smartphone"
            }
        },
        ... (shortened for brevity)
    ],
    "eventDate": "2019-09-10",
    "totalArticleCount": 3776,
    "title": {
        "eng": "Tepid reaction for Apple's new iPhones in China amid tough competition",
        "spa": "Todo lo que se sabe del nuevo iPhone 11",
        "zho": "\u860b\u679c911\u65b0\u54c1\u767c\u8868 \u6697\u85cf\u9a5a\u559c - \u8ca1\u7d93\u8981\u805e",
        "rus": "Apple \u0440\u0435\u0437\u043a\u043e \u0441\u043d\u0438\u0437\u0438\u043b\u0430 \u0446\u0435\u043d\u044b \u043d\u0430 iPhone"
    },
    "summary": {
        "eng": "HANGZHOU/SHANGHAI (Reuters) - A lower price tag and new features may not be enough for Apple Inc (AAPL.O) to win customers for its newly-launched iPhone 11 series ... (shortened for brevity)",
        "spa": "Filtraciones y m\u00e1s filtraciones, pero hasta la jornada previa a la presentaci\u00f3n de los nuevos iPhone nadie ha podido mostrar una fotograf\u00eda o v... (shortened for brevity)",
        "zho": "\u860b\u679c\u79cb\u5b63\u767c\u8868\u6703\u5c07\u5728\u53f0\u7063\u6642\u95939\u670811\u65e5\u51cc\u66681\u9ede\u767b\u5834\uff0c\u9810\u671f\u5c07\u767c\u8868... (shortened for brevity)",
        "rus": "\u041d\u043e\u0432\u044b\u0439 iPhone \u043d\u0430 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u043e\u0434\u0430\u0436 \u0431\u0443\u0434\u0435\u0442 ... (shortened for brevity)"
    },
    "location": {
        "type": "place",
        "label": {
            "eng": "Cupertino, California"
        },
        "country": {
            "type": "country",
            "label": {
                "eng": "United States"
            }
        }
    },
    "categories": [
        {
            "uri": "dmoz/Computers/Systems",
            "label": "dmoz/Computers/Systems",
            "wgt": 17
        },
        {
            "uri": "dmoz/Computers/Systems/Apple",
            "label": "dmoz/Computers/Systems/Apple",
            "wgt": 21
        },
        {
            "uri": "dmoz/Computers/Hardware/Retailers",
            "label": "dmoz/Computers/Hardware/Retailers",
            "wgt": 21
        },
        {
            "uri": "news/Technology",
            "label": "news/Technology",
            "wgt": 82
        }
    ],
    "articleCounts": {
        "eng": 1850,
        "spa": 509,
        "zho": 508,
        "rus": 231
    },
    "sentiment": 0.1607843137254903,
    "wgt": 3776
}

Retrieving a list of articles about an event

In order to retrieve a list of articles that discuss a single event, you can use the QueryEventArticlesIter class. The class requires that you provide the eventUri value, which is the unique id of the event. Additionally, you can also specify additional constraints that determine which subset of articles about the event to retrieve:

  • lang - in which language should be the articles
  • keywords - which keywords should be mentioned in the articles
  • conceptUri - which concepts should be mentioned in the articles
  • categoryUri - which category should be assigned to the articles
  • sourceLocationUri - what should be the source of the publisher
  • authorUri - who should be the author of the articles
  • locationUri - which location should be mentioned in the dateline of the article
  • dateStart - on which date or after should the articles be published
  • dateEnd - before or on which date should the articles be published
  • keywordLoc - if keywords are set, where should we search for them (body (default) or title)
  • startSourceRankPercentile - what is the minimum source rank of the returned articles (0 - 90, divisible by 10)
  • endSourceRankPercentile - what is the maximum source rank of the returned articles (10 - 100, divisible by 10)
  • minSentiment - minimum sentiment of the articles (between -1 and 1)
  • maxSentiment - maximum sentiment of the articles (between -1 and 1)

An example for the event above could be:

In [18]:
q = QueryEventArticlesIter("eng-5059598",
    lang = "eng",
    sourceLocationUri = er.getLocationUri("United states"),
    minSentiment = 0.2,
    endSourceRankPercentile = 30)
for art in q.execQuery(er, maxItems = 2, returnInfo = ReturnInfo(articleInfo = ArticleInfoFlags(bodyLen=300))):
    print(art)
{'uri': '1252630658', 'lang': 'eng', 'isDuplicate': False, 'date': '2019-09-10', 'time': '19:53:00', 'dateTime': '2019-09-10T19:53:00Z', 'dataType': 'news', 'sim': 0.772549033164978, 'url': 'https://www.tomsguide.com/news/iphone-11-pro-price-specs', 'title': 'iPhone 11 Pro Hopes to Wow Users With Triple Cameras, Better Performance', 'body': "Extra cameras, design tweaks highlight this year's high-end iPhones\n\nWith two of the new iPhones that were unveiled today (Sept. 10), Apple is going pro.\n\nThe iPhone 11 Pro and iPhone 11 Pro Max are Apple's latest high-end smartphones, replacing last year's iPhone XS and XS Max in the iPhone lineup. ...", 'source': {'uri': 'tomsguide.com', 'dataType': 'news', 'title': "Tom's Guide"}, 'authors': [], 'image': 'https://cdn.mos.cms.futurecdn.net/taYs8gxnegn5fGt8MEzywN-1200-80.jpeg', 'eventUri': 'eng-5059598', 'sentiment': 0.3647058823529412, 'wgt': 197}
{'uri': '1253023425', 'lang': 'eng', 'isDuplicate': False, 'date': '2019-09-11', 'time': '03:54:00', 'dateTime': '2019-09-11T03:54:00Z', 'dataType': 'news', 'sim': 0.7686274647712708, 'url': 'https://www.forbes.com/sites/gordonkelly/2019/09/10/apple-iphone-11-vs-iphone-11-pro-whats-the-difference-new-iphone-upgrade/', 'title': "Apple iPhone 11 Vs iPhone 11 Pro: What's The Difference?", 'body': "For the second year in a row, Apple's most exciting new iPhone is its cheapest. The confusingly named iPhone 11 replaces the iPhone XR and, pound for pound, it blows the socks of the iPhone 11 Pro, the confusingly named successor to the iPhone XS. This is why.\n\nDisplays - HD and XDR but not ...", 'source': {'uri': 'forbes.com', 'dataType': 'news', 'title': 'Forbes'}, 'authors': [], 'image': 'https://thumbor.forbes.com/thumbor/600x300/https%3A%2F%2Fblogs-images.forbes.com%2Fgordonkelly%2Ffiles%2F2019%2F09%2FScreenshot-2019-09-10-at-22.27.17-Edited.png', 'eventUri': 'eng-5059598', 'sentiment': 0.2078431372549019, 'wgt': 196}

Searching for events using complex queries

As with the article search, event search using QueryEventsIter also allows you just to narrow down the set of matching events, with each added filter. If you want to create a more complex query that has a Boolean OR between two different types of filters, you have to use the Advanced Query Language.

The syntax is the same as when searching for articles. An example of such a query could look like this:

In [24]:
qStr = """{
    "$query": {
        "$or": [
            { "locationUri": "%s" },
            { 
                "categoryUri": "%s",
                "conceptUri": "%s"
            }
        ]
    }
}""" % (er.getLocationUri("Washington"), er.getCategoryUri("politics"), er.getConceptUri("Trump"))
print(qStr)

q = QueryEventsIter.initWithComplexQuery(qStr)
for event in q.execQuery(er, sortBy = "size", maxItems = 1):
    print(json.dumps(event, indent = 4))
{
    "$query": {
        "$or": [
            { "locationUri": "http://en.wikipedia.org/wiki/Washington_(state)" },
            {
                "categoryUri": "news/Politics",
                "conceptUri": "http://en.wikipedia.org/wiki/Donald_Trump"
            }
        ]
    }
}
{
    "uri": "eng-5028293",
    "concepts": [
        {
            "uri": "http://en.wikipedia.org/wiki/Boris_Johnson",
            "type": "person",
            "score": 100,
            "label": {
                "eng": "Boris Johnson"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Brexit",
            "type": "wiki",
            "score": 86,
            "label": {
                "eng": "Brexit"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Parliament",
            "type": "wiki",
            "score": 79,
            "label": {
                "eng": "Parliament"
            }
        },
        ... (shortened for brevity)
    ],
    "eventDate": "2019-08-27",
    "totalArticleCount": 2800,
    "title": {
        "eng": "Government asks Queen to suspend Parliament",
        "fra": "Brexit: Hugh Grant \u00e9trille Boris Johnson et lui demande d'aller \" se faire foutre \"",
        "zho": "\u82f1\u56fd\u8131\u6b27\uff1a\u8bae\u4f1a\u88ab\u505c\u6446\u5982\u4f55\u963b\u6b62\u65e0\u534f\u8bae\u786c\u8131",
        "por": "Brexit a f\u00f3rceps - 30/08/2019 - Opini\u00e3o - Folha"
    },
    "summary": {
        "eng": "The government has asked the Queen to suspend Parliament just days after MPs return to work in September... (shortened for brevity)",
        "fra": "L'acteur britannique a vivement r\u00e9agi sur Twitter \u00e0 la d\u00e9cision du Premier ministre ... (shortened for brevity)",
        "zho": "\u8fd9\u662f\u5916\u90e8\u94fe\u63a5\uff0c\u6d4f\u89c8\u5668\u5c06\u6253\u5f00\u53e6\u4e00\u4e2a\u7a97... (shortened for brevity)",
        "por": "Faltando apenas dois meses para o fim do prazo acordado para o brexit, a sa\u00edda do Reino Unido da ... (shortened for brevity)"
    },
    "location": {
        "type": "place",
        "label": {
            "eng": "London"
        },
        "country": {
            "type": "country",
            "label": {
                "eng": "United Kingdom"
            }
        }
    },
    "categories": [
        {
            "uri": "dmoz/Society/Government",
            "label": "dmoz/Society/Government",
            "wgt": 11
        },
        {
            "uri": "dmoz/Society/Politics",
            "label": "dmoz/Society/Politics",
            "wgt": 12
        },
        {
            "uri": "dmoz/Society/Government/Parliaments_and_Legislatures",
            "label": "dmoz/Society/Government/Parliaments and Legislatures",
            "wgt": 24
        },
        {
            "uri": "news/Politics",
            "label": "news/Politics",
            "wgt": 91
        }
    ],
    "articleCounts": {
        "eng": 827,
        "fra": 341,
        "zho": 325,
        "por": 204
    },
    "sentiment": -0.05882352941176472,
    "wgt": 2800
}

Retrieving summaries of search results

In addition to obtaining a list of events that match the search results, you can also obtain various summaries of search results. In order to obtain some summary about events that match your search criteria, you have to use the QueryEvents class. The class accepts the same filtering parameters as the QueryEventsIter class, but in addition also accepts the requestedResult parameter, which should be set to one of the following values:

  • RequestEventsInfo - returns a list of events
  • RequestEventsUriWgtList - returns a long list of event URIs that match search results
  • RequestEventsTimeAggr - retrieves a time distribution of events in search results
  • RequestEventsKeywordAggr - retrieves top keywords in the events that match search conditions
  • RequestEventsLocAggr - retrieves the locations where the events happened
  • RequestEventsLocTimeAggr - retrieves the locations and times when the events happened
  • RequestEventsConceptAggr - retrieves the top concepts mentioned in the events
  • RequestEventsConceptGraph - retrieves the top concepts and their co-occurrences
  • RequestEventsSourceAggr - retrieves the top sources in the events
  • RequestEventsDateMentionAggr - retrieves the top dates mentioned in the events
  • RequestEventsCategoryAggr - retrieves the top categories in the events
In [22]:
# what is being mentioned the most in the events about China and US?
q = QueryEvents(
    conceptUri = QueryItems.AND([er.getConceptUri("China"), er.getConceptUri("United States")]),
    requestedResult = RequestEventsConceptAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
{
    "conceptAggr": {
        "usedResults": 14369,
        "totalResults": 14369,
        "results": [
            {
                "uri": "http://en.wikipedia.org/wiki/United_States_dollar",
                "type": "wiki",
                "label": {
                    "eng": "United States dollar"
                },
                "score": 100
            },
            {
                "uri": "http://en.wikipedia.org/wiki/Market_share",
                "type": "wiki",
                "label": {
                    "eng": "Market share"
                },
                "score": 66.71204188481676
            },
            {
                "uri": "http://en.wikipedia.org/wiki/Europe",
                "type": "loc",
                "label": {
                    "eng": "Europe"
                },
                "location": null,
                "score": 66.57112939416605
            },
            ... (shortened for brevity)
        ]
    }
}
In [23]:
# what are the top categories in recent events about AI?
q = QueryEvents(
    conceptUri = er.getConceptUri("artificial intelligence"),
    requestedResult = RequestEventsCategoryAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
{
    "categoryAggr": {
        "usedResults": 3493,
        "totalResults": 3493,
        "results": [
            {
                "uri": "news/Technology",
                "label": "news/Technology",
                "count": 1174
            },
            {
                "uri": "news/Business",
                "label": "news/Business",
                "count": 938
            },
            {
                "uri": "news/Politics",
                "label": "news/Politics",
                "count": 94
            },
            {
                "uri": "news/Arts_and_Entertainment",
                "label": "news/Arts and Entertainment",
                "count": 80
            },
            ... (shortened for brevity)
        ]
    }
}