Untitled

# Natural Language Queries using the semparse demo
The current NLQ demo, which I am calling `semparse` in this document, is not ready for prime time. It gives poor results on common non-NL queries (e.g. "Deep Learning"). We don't have good data on how it performs on NL queries, but anecdotally we've seen it give useful results. As I see it we have three options for using the existing component in production:

* Improve the `semparse` component to match the current customer experience.
* Create a labs.semanticscholar.org site to allow customers to opt-in to semantic parsing.
* Use prefix filtering to opt customers into semantic parsing.

All three options have widely varying costs and customer impact.

## Recommended option

### "Labs" search landing page.

I recommend we create a new landing page, or separate site, for natural language queries. This option is relatively cheap and has no impact on existing customers.

This solution would be composed of three new components.

1. A `semparse` subproject in SBT.
   1. Add a build step to make the serialized CcgParser
   1. Fix up the dependencies (e.g. jklol.jar)
   1. Trim the code to the bare essentials.
2. A new page, similar to the homepage, for NL queries.
   1. A new `nql.jsx` page that contains a search bar.
   1. Routes each search through `semparse`.
   1. (Optional:) Constrains other searches to use `semparse` via a cookie or new SERP.
3. (Optional:) Potentially a route for the new URL
   1. An ALB or Nginx route for the new URL to the new page.


#### Pros
* Cheap.
* Preserves existing search experience.

#### Cons
* Does nothing to directly improve the semantic parser.
* Adds some (~300MB) memory pressure to the existing S2 app.
* Potentially hard for customers to find and try.
* No way to measure success.

## Other options considered

### Production-ize demo parser

This is very ambiguous. As an engineer with no prior NLU experience, I don't have great insight into how we address problems. I believe this is a large research project being done by other folks at AI2, so we may be able to leverage that.

#### Pros
* Deliberate, measurable improvements to the parser.

#### Cons
* Expensive, potentially very expense.
* Ambiguous.
* Requires investment in A/B testing, query scoring, etc...

### Prefix filtering

Simply funnel any query that begins with a pre-defined set of prefixes (e.g. "Who", "What", "How"...) to the `semparse` module. Otherwise execute the normal query flow.

#### Pros
* Cheapest option.
* Most queries will be unaffected.

#### Cons
* Hard for customers to discover.
* May catch some not-NL queries accidentally.
* No way to measure success.
* Adds some (~300MB) memory pressure to the existing S2 app.
* Does nothing to directly improve the semantic parser.