Advertisement
Guest User

Untitled

a guest
Aug 8th, 2016
137
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.87 KB | None | 0 0
  1. import urllib;
  2. import bs4;
  3.  
  4.  
  5. secFtpBase = 'ftp://ftp.sec.gov/edgar/';
  6. secUserName = 'anonymous';
  7. secPassWord = 'tmo8145@gmail.com';
  8.  
  9.  
  10. """
  11. Downloads the sec filing via ftp and returns a uncleaned html document object model (dom) as a str
  12.  
  13. Cik is used to map a company to a unique id
  14. Accession Number is a number unique to the filing document
  15.  
  16. """
  17. def downloadSecDocument(cik, accessionNum):
  18. documentFtpLink = secFtpBase + 'data/' + cik + '/' + accessionNum + '/' + accessionNum[0:11] + '-' + accessionNum[11:12] + '-' + accessionNum[12:];
  19. fileStr = requests.get(documentFtpLink, auth=(secUserName, secPassWord));
  20.  
  21.  
  22. """
  23. Returns the basic file data located at the top of the filing before the html document starts
  24.  
  25. Returned data includes: accession number, acceptance datetime, form type, report period, company name, cik
  26. """
  27. def getBasicFileData(fileStr):
  28. fileLines = fileStr.iter_lines():
  29. acceptaceDateTimeStr = [21:];
  30. accessionNumber = [19:];
  31. submissionType = [28:];
  32. companyName =
  33. cik =
  34. reportPeriod = fileLine[];
  35. fileDate = fileLine[];
  36.  
  37.  
  38. """
  39. Main function for parsing a 10k/Q
  40.  
  41. fileStr is an uncleaned html/css str containing
  42.  
  43. """
  44.  
  45. def parsefile(fileStr):
  46. fileDom = BeautifulSoup(fileStr);
  47.  
  48.  
  49. """
  50. Extracts tables from a beautiful soup obj and returns it in a dictionary form
  51.  
  52. """
  53. def parseTables(fileDom):
  54. var tableObjs =fileDom.find_all('table');
  55.  
  56. for tableObj in tableObjs:
  57. parseTable(tableObj);
  58.  
  59. """
  60. Given a table in html format, creates a list of lists that represents the table
  61.  
  62. """
  63. def parseTable(tableObj)
  64. var tableRows = tableObj.find_all('tr');
  65.  
  66. #Need to check if row contains headers with <th> tags or data with <td tags>
  67. #Row might contain headers and data b/c of row headers
  68. for tableRow in tableRows:
  69. rowReaders = tableRow.find_all('th');
  70. rowData = tableRow.find_all('td');
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement