The DiMSUM shared task at SemEval 2016 is concerned with predicting, given an English sentence, a broad-coverage representation of lexical semantics. The representation consists of two closely connected facets: a segmentation into minimal semantic units, and a labeling of some of those units with semantic classes known as supersenses.

For example, given the POS-tagged sentence

IPRON googledVERB restaurantsNOUN inADP theDET areaNOUN andCONJ FujiPROPN SushiPROPN cameVERB upADV andCONJ reviewsNOUN wereVERB greatADJ soADV IPRON madeVERB aDET carryVERB outADP orderNOUN

the goal is to predict the representation

I googledv.communication restaurantsGROUP in the arean.location and Fuji_Sushin.group came_upv.communication and reviewsn.communication werev.stative great so I made_ a carry_outv.possession _orderv.communication

Noun supersenses start with n., verb supersenses with v., and _ joins tokens within a multiword expression. (carry_outv.possession and made_orderv.communication are separate MWEs.)

The two facets of the representation are discussed in greater detail below. Systems are expected to produce the both facets, though the manner in which they do this (e.g., pipeline vs. joint model) is up to you.

Gold standard training data labeled with the combined representation is provided in two domains: online reviews and tweets. (Rules for using other data resources in data conditions.) Blind test data will be in these two domains as well as a third, surprise domain. The domain of each sentence will not be indicated as part of the input at test time. The three test domains will have equal weight in the overall system scores (see scoring procedure).

Minimal semantic units

The word tokens of the sentence are partitioned into basic units of lexical meaning. Equivalently, where multiple tokens function together as an idiomatic whole, they are grouped together into a multiword expression (MWE). MWEs include: nominal compounds like hot dog; verbal expressions like do away with 'eliminate', make decisions 'decide', kick the bucket 'die'; PP idioms like at all and on the spot 'without planning'; multiword prepositions/connectives like in front of and due to; multiword named entities; and many other kinds.

Refer to this LREC 2014 paper for details of the MWE annotation.

Semantic classes

These are broad-coverage lexical categories known as "supersenses".

Refer to this NAACL 2015 paper for details of the annotation of supersenses on top of MWEs.

Data conditions

There will be three conditions according to which systems will be compared.

To facilitate a controlled comparison of algorithms, in the (semi-)supervised closed conditions, systems may only use specific data resources.

System submissions will specify which of these datasets were used, and this will determine which competition(s) it is entered into. Each team is allowed to submit up to 3 systems—one in the supervised closed condition, one in the semi-supervised closed condition, and one in the open condition.

A new test set has been annotated for this task. A blind version (input only) has been released in advance of the evaluation period.

Scoring

Our evaluation script, dimsumeval.py, is bundled with the latest data release. Primarily, it reports F-scores for MWE identification, supersense labeling, and their combination. See the documentation in the script for an overview. Further details of the scoring procedure will be announced at a future time.

Downloads

System submission

The deadline for system outputs to be submitted is Jan. 31, 2016. Instructions:

  1. Create a .zip file containing:
    • dimsum16.test.pred, your system's output. Each line should correspond to a line in dimsum16.test.blind, with values filled into the appropriate columns. You should run the evaluation script (with a dummy gold standard) to ensure your system output is formatted correctly.
    • submission.csv, but filled out with a specification of the resources used for the submission.
  2. Create a new START submission (requires an account). Provide the team name and members, select Task 10, and write a short blurb about your system. Upload the zip file.
  3. After creating your submission, you may update it (both metadata and zip file) up until the deadline. Only the last file upload for that submission will be retained.
  4. Each team may enter up to 3 submissions, one per data condition. Specify the same team name for all of them, and ensure the submission.csv is different for the different conditions. If multiple submissions from the same team fall under the same data condition, the last one will be used for the evaluation.

We plan to make all submissions—including the test set predictions—public at some point subsequent to the task. (If there are institutional barriers to this, please contact the organizers.)

Organization

Please subscribe to https://groups.google.com/group/dimsum16 for announcements about the task.

See the schedule for planned data releases and deadlines.

The organizers are: