langchain
pypdf
unstructured[pdf]
unstructured[docx]
unstructured[ppt]
docx2txt
unstructured
