António Branco is the Director of the Portuguese node of the CLARIN research infrastructure. He is a professor of language science and technology at the University of Lisbon, where he was the founder and is the head of research of the Natural Language and Speech Group (NLX Group) of the Department of Informatics. He is the (co-)author of over 150 publications in the area of language science and technology and has participated and coordinated several national and international R&D projects. He was the coordinator of the European project METANET4U, integrating the R&D network of excellence META-NET. He is a member of the META-NET Executive Board and he is the first author of the White Paper on the Portuguese Language in the Digital Age.
He is coordinating the QTLeap project, an European research project on quality machine translation by deep language engineering approaches.
With or without meaning? With or without funding?: Cycles in language technology research and what we may learn from them
As language technology has now been an increasingly active research area for long enough time in the last six decades, it offers itself to be the object of a meta-research exercise of seeking to devise successive epistemological and institutional paradigms its progress may have been based upon.
In this talk, I will seek to support and share the view that, in line with the structure of the scientific (r)evolutions that other research areas may go through, also here scientific evolution seems to go through a succession of different epistemological paradigms and institutional supporting schemes. Interestingly, this progression seems to exhibit a somewhat cyclic shape.
There was an initial paradigm where, no matter remote it might be with respect the natural language processing task of interest, the representation of meaning was assumed to be the ultimate cornerstone of the technology to be researched.
From mid nineties onward, given the stalemate of endless new proposals of yet another fragment of some logics to represent natural language meaning with no clearly visible impact on the progress of the area, and having statistical machine translation as one of its major driving application, a new paradigm rapidly spread based initially on the challenging assumption that research could very well be advanced without resorting to semantic representation.
Some twenty years later, given the stalemate of endless new proposals of yet another cocktail of statistical techniques with no clearly perceived cumulative impact on the progress of the area beyond the few positive centesimal points of the delta measured on the specific dataset and domain that is being resorted to, this is the paradigm that is nowadays giving way to a come back of the view that semantic representation is key to leverage the progress of research in language technology.
It is worth noting, though, that this renaissance of the semantic representation primacy is not a mere come back to the logic-based representation of meaning but to a richer view of meaning representation that crucially encompasses also distributional semantics, a vector-based representation of meaning.
Interestingly, on the institutional side, not only a cyclic shaped evolution seems also to emerge in the support that this research area has received at the European level, as its shape seems to be pretty much in phase with the paradigmatic shifts occurred on the epistemological side.
Given the promotion of multilingualism inscribed in its core mission, the interest of the European Commission on language technology was initially focused in machine translation as an advanced support for its translation and multilingual services, which represented around 1/3 of the running costs of the Commission. This interest materialized in the EUROTRA project, a massive, in house research and development project aimed at setting up machine translation software to serve its internal needs and run under the coordination of the services of the Commission, with major components being commissioned to third parties identified by the Commission in the different member states.
By mid nineties, as this machine translation project came to an end, the Commission broadened its interest to the full range of the language technology area. By assuming this was a first rank scientific area — and also a political priority — on a par with any other discipline worth being supported, the Commission supported a series of full-scale, competitive funding schemes and opportunities aiming at the progress of language technology that were aligned with similar schemes used for other research areas.
After being supported during three R&D framework programs, in the current H2020 program, funding of language technology by the European Commission is again focused on machine translation, and again narrowed down to a particular in house project, CEF.AT, which is being run under the coordination of the Commission and with components being commissioned to third parties, with the major goal of addressing the specific needs of its services for multilingual technology.
While proceeding with the meta-analysis exercise just sketched above, and in line with the objectives of the present workshop on "Future and Emerging Trends in Language Technologies" (Seville, December 1-2, 2016), in this talk I will ponder and seek to foster the discussion on which long term progression lines may be devised in the scope of such an exercise, and if and how these lines can eventually help us to devise future and emerging trends in our area, which in the epistemological side should thus be pursued, and in the funding side should be fostered.