Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Vectorize ticket descriptions using Word2Vec
- textList = [TaggedDocument(doc, [i]) for i, doc in enumerate(unformattedList)]
- numCores = multiprocessing.cpu_count() # Compute the number of logical processors (how many workers we will use to train the Doc2Vec model)
- x = np.array(Doc2Vec(textList, workers=numCores, vector_size=150))
- # Vectorize ticket labels/tags using MultiLabelBinarizer
- tagList = relevantDF.Tags
- vectorizer2 = MultiLabelBinarizer()
- vectorizer2.fit(tagList)
- y = vectorizer2.transform(tagList)
- # Split test data and convert test data to arrays
- xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.20)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement