Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Natural Language Queries using the semparse demo
- The current NLQ demo, which I am calling `semparse` in this document, is not ready for prime time. It gives poor results on common non-NL queries (e.g. "Deep Learning"). We don't have good data on how it performs on NL queries, but anecdotally we've seen it give useful results. As I see it we have three options for using the existing component in production:
- * Improve the `semparse` component to match the current customer experience.
- * Create a labs.semanticscholar.org site to allow customers to opt-in to semantic parsing.
- * Use prefix filtering to opt customers into semantic parsing.
- All three options have widely varying costs and customer impact.
- ## Recommended option
- ### "Labs" search landing page.
- I recommend we create a new landing page, or separate site, for natural language queries. This option is relatively cheap and has no impact on existing customers.
- This solution would be composed of three new components.
- 1. A `semparse` subproject in SBT.
- 1. Add a build step to make the serialized CcgParser
- 1. Fix up the dependencies (e.g. jklol.jar)
- 1. Trim the code to the bare essentials.
- 2. A new page, similar to the homepage, for NL queries.
- 1. A new `nql.jsx` page that contains a search bar.
- 1. Routes each search through `semparse`.
- 1. (Optional:) Constrains other searches to use `semparse` via a cookie or new SERP.
- 3. (Optional:) Potentially a route for the new URL
- 1. An ALB or Nginx route for the new URL to the new page.
- #### Pros
- * Cheap.
- * Preserves existing search experience.
- #### Cons
- * Does nothing to directly improve the semantic parser.
- * Adds some (~300MB) memory pressure to the existing S2 app.
- * Potentially hard for customers to find and try.
- * No way to measure success.
- ## Other options considered
- ### Production-ize demo parser
- This is very ambiguous. As an engineer with no prior NLU experience, I don't have great insight into how we address problems. I believe this is a large research project being done by other folks at AI2, so we may be able to leverage that.
- #### Pros
- * Deliberate, measurable improvements to the parser.
- #### Cons
- * Expensive, potentially very expense.
- * Ambiguous.
- * Requires investment in A/B testing, query scoring, etc...
- ### Prefix filtering
- Simply funnel any query that begins with a pre-defined set of prefixes (e.g. "Who", "What", "How"...) to the `semparse` module. Otherwise execute the normal query flow.
- #### Pros
- * Cheapest option.
- * Most queries will be unaffected.
- #### Cons
- * Hard for customers to discover.
- * May catch some not-NL queries accidentally.
- * No way to measure success.
- * Adds some (~300MB) memory pressure to the existing S2 app.
- * Does nothing to directly improve the semantic parser.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement