Best Grep Tool For Os X
- Best Grep Tool For Os X Catalina
- Best Grep Tool For Os X Os
- Best Grep Tool For Os X Download
- Best Grep Tool For Os X Operating System
Nisus Writer Express, which is now obsolete. (2007-09-26)
Version 0.7.2 (2007-09-26)
Mac OS X is a powerful Unix OS enabling the users to use many Unix tools such as grep, awk, sed, etc.Unfortunately, these are generally command line tools, not very well integrated with the GUI environment. My Rec ommendation for dnGrep. My Recommendation for dnGrep. 8 Pros 5 Cons 2.
Introduction
As I wrote elsewhere (cf. my page rtf to UTF-8 text and pTeX), Mac OS X lacks a good word-processing program -- this problem can be solved in part by using LaTeX... ; but it lacks also a good search program for text contents of files. The best search program for Classic Mac OS, MgrepApp still works on the Classic environment of OS X (although MgrepApp throws at the start-up time a warning saying that it expired, you can get rid of it by hitting the Return key and clicking the close window box); but it will not work on Intel-Mac...! Mac OS X comes with a very powerful and fast searching utility which is GNU grep, but you have to use Terminal to use it, and this is not easy at all. On the other hand, GNU grep can only search in UTF-8 text files; as many files for our stuides, such as CBETA Taisho files or SAT Taisho files are either in Big5-Eten or in Shift_JIS, this is very inconvenient (although CBETA is beginning to distribute UTF-8 files as well... [but be aware that CBETA UTF-8 files use Windows line ending characters, so that they may raise problems when using on OS X applications...]).Therefore, I wrote an AppleScript droplet which will convert all files in a folder into UTF-8 files (please see my other page Batch Convert Files to UTF-8); and I wrote another AppleScript droplet which can be used as an interface for GNU grep. Used in combination with TextWrangler and/or Jedit X, this latter droplet can simulate to a certain extent the behavior of MgrepApp: you will get a list of result; double-clicking on one item, TextWrangler or Jedit X will open the target file, and will select the target matched word. It is this droplet, named unix_grep.app, that I would like to present in this page. The files MUST be in UTF-8 encoding; files with Mac line ending characters cannot be searched, but those with Windows line ending characters can be used (of course, those with Unix line ending characters are preferable).
TextWrangler is a free text editor, very powerful, very fast, and very useful. Unfortunately, it cannot handle styled text.
Jedit X is a shareware text editor (2940 yen or $28); it is also very powerful; it is perhaps a little slower than TextWrangler, but it can handle styled text as well as TextEdit.
I assume in this ReadMe that we are using unix_grep.app for search words in Taisho text files, and will take examples for this kind of use. To follow these examples, you should first have converted to UTF-8 text files, at least some folders of CBETA files, using my other utility, 'batch_conv2utf8_encoding_check.app'. -- But of course, you can use unix_grep.app for any other UTF-8 text files or folders containing UTF-8 text files.
Notes on the new version 0.7.1:After I released the first version of my unix_grep.app, I asked a friend, Hamid Haji, to test it. He discovered a serious problem with Arabic text files -- in many cases, my droplet fails to find the searched words, and often it crashes. I tried to find the culprit, and discovered that AppleScript's choose from list command is unable to handle long string of Arabic text. Moreover, AppleScript's droplet mechanism cannot deal correctly with Arabic file names. I think the same is true for other languages (scripts) using ligatures, e.g. Hebrew or Devanagari -- and perhaps other languages/scripts.
To avoid the problem with AppleScript's choose from list, I wrote a new version in which I added a new option, special_scripts. If you set this option to 1, the list selecting window will be skipped, and the result of the search will be opened right away in your default application. -- This option can be used for other languages/scripts as well, if you don't need the list selecting window. It is certainly faster, and more robust than when using the list selecting window. So, if you think that the search result will be very large, it will be probably better to use this option.
As to the problem of file names in Arabic or other scripts using ligature, the only way to avoid it is either rename the files with Roman name, or put the files in a folder having a Roman names (and set the option recursive to 1 if the files are in sub-folders [I think the sub-folders can have Arabic names...]).
The new version is improved in other parts also: it can now accept theoretically any number of files or folders (so that perhaps the save symlink mode is a little less useful in this version [see below]).
I changed also the format of the result file, in which the first line will summarize the search result.
Finally, I changed the two scripts for TextWrangler and Jedit X, named open_file_fromGrepRes.scpt, so that now, it will not only open the target file and select the target line, but select the target word.
I rewrote the following documentation to fit with the new version.
End of notes for version 0.7.1.
notes for version 0.7.2
I fixed one bug in the interface: when you have once entered '*' in ext field (standing for 'all files'), it was impossible to get rid of it. This bug, reported by John McRae, could be fixed.s
End of notes for version 0.7.1.
Requirements, Contents of the package and How to install:
Requirements:
- OS X 104 and later
- Jedit X -- This is optional (the demo version, working for one month, can be downloaded free of charge...)
- a bunch of folders containing UTF-8 text files
Contents:
When expanded, the package that you will download from this page (see at the bottom of the page) will contain:- unix_grep_AppleScript/
- (Don't change this file) settings.txt
- ReadMe.rtf this file
- grep_symlinks_folder an empty folder
- unix_grep.app
- unix_grep_res.txt an empty UTF-8 file
- Put_in_App_script_folder/
- for_Jedit_X/
- open_file_fromGrepRes.scpt
- for_TextWrangler/
- open_file_fromGrepRes.scpt
- for_Jedit_X/
How to install:
To use the two scripts named open_file_fromGrepRes.scpt for two applications, one for TextWrangler and the other for Jedit X, you have to copy them in their respective 'Scripts' folder.For TextWrangler, it is easy:
- Locate the Scripts folder for TextWrangler:
/Users/[your_account]/Library/Application Support/TextWrangler/Scripts/ - Click on the script open_file_fromGrepRes.scpt in the folder for_TextWrangler, press the Option key, and drag the script into that folder.
- You should also set the default file encoding of TextWrangler to Unicode (UTF-8, no BOM) if it is not, at this moment:
Launch TextWrangler, and choose the menu-item TextWrangler > Preferences; in the left side pane, click Text encodings. At the bottom of the window, set the popup menu under If file's encoding can't be guessed, use: to Unicode (UTF-8, no BOM).
For Jedit X:
- Launch Jedit X, and select Window > Script Window or Macros > Show Script Window in Jedit X to display the Script window.
- Click on the Macro Menu tab of the Script window;
- Drag the script open_file_fromGrepRes.scpt in the folder for_Jedit_X from the Finder to the desired location in the Script window to save it there.
- You will be asked which you want to copy the script file or the alias file. You would click on Copy, and the script file will automatically be saved in the following scripts folder:
/Users/[your_account]/Library/Application Support/Jedit X/scripts/ - For Jedit X too, you should set the default file encoding to Unicode (UTF-8) at this moment:
Choose the menu-item Jedit X > Preferences; press the icon Encoding at the top of the window; set the pop-up menu under Default Encoding and Line Endings for Plain Text to Unicode (UTF-8) (and the Line Endings to Unix (lf)).
For details, you can refer to Jedit X's help: Chapter 11.2: 'Script Window', and Chapter 2.4: 'Encoding'.
After you have installed these two scripts, you can place the folder Unix_grep_AppleScript anywhere you want (preferably on your desktop?), but you should NOT change the structure of this folder. Especially, unix_grep.app, the folder grep_symlinks_folder and the text file unix_grep_res.txt should be in the same folder.
How to use:
To see how unix_grep.app works, first, make sure that you have all the needed pieces:- TextWrangler
- Jedit X (although this is optional, I would recommend to download it, if you don't have it already...)
- one or a bunch of folders full of text files in UTF-8 (that you may have created with my another utility batch_convert2utf8.app [see the page Batch Convert Files to UTF-8]...) -- for example, a folder named 'T01', containing all the files of volume 1 of the Taisho Canon.
-- Hereafter, I will take folders of the Taisho Canon as example.and of course
- unix_grep.app
Here are the basic steps:
- Drag and drop your folder 'T01' onto the icon of unix_grep.app.
- A dialog will appear, asking you to enter a search word.
- Enter for example '大自在' (without quotes)
- You will see almost immediately a list selecting dialog, with the title:
Found 2 matches...
with the following prompt:
Choose one to open the file with the default_app... or... Press OK with no selection to save the result.
And in the list selecting window, you will see two lines: one beginning with T01n0022.txt and the other with T01n0081.txt (here, I use the CBETA files as example...). - For this time, select the first item, the one which is beginning with T01n0022.txt, and press OK (you can do this by hitting once the Down Arrow key, then the Return key; alternatively, you can select the item with the mouse and double-click on it).
- TextWrangler will launch, open the file T01n0022.txt and select the word '大自在' in the line 403 (if the line contains more than one occurrence of the target word, only the FIRST one will be selected):
T01n0022_p0275b07(03)||其行平等。尊大自在。心念無畏。以一身化無數身。
- The same list selecting window of unix_grep.app will appear again, at the front.This would repeat indefinitely if you don't do either of the following steps... So, to get rid of this list selecting window and return to TextWrangler, you can either click on the button Cancel or OK.
- If you click the Cancel button, unix_grep.app will quit without doing anything (and the result of the grep search will be discarded);
- If you click the OK button, the result of the search will be written in a file, the file named unix_grep_res.txt (located in the same folder as unix_grep.app), and this file will be opened by TextWrangler.
- If you don't select any line at the first list selecting window, and press the OK button (or hit the Return key), the file unix_grep_res.txt will be opened by TextWrangler, and unix_grep.app will quit.
- If you don't select any line at the first list selecting window, and/or press the Cancel button, unix_grep.app will quit, discarding the search result.
Be warned that the result of the search will be overwritten each time in the file unix_grep_res.txt, so that you should close this window each time. If you want to save the result, you will have to save it in another file.
You can use the result in the file unix_grep_res.txt to open the file and select the word of your search.
- Select one line (ALL the line) of the result file that you want, and run the script open_file_fromGrepRes from the AppleScript menu of TextWrangler (or the Macro menu if you use Jedit X).
- The target file will open, and the target word will be selected.
This is the basic use of the application.
How to configure the settings:
To see different possible settings, please open the file named (Don't change this file) settings.txt with TextEdit or any other text editor. You will see the following default settings:ignore_case: | 0 |
recursive: | 0 |
ext: | txt |
default_app: | TextWrangler |

save_symlink: | 0 |
add_to_symlink: | 0 |
special_scripts: | 0 |
The file (Don't change this file) settings.txt is there only to show you the default setting of the droplet. If you don't need it, you can put it anywhere.
You will see the same list if you double-click on the icon of unix_grep.app. You can change this default setting:
Double-click on the the icon of unix_grep.app: you will see a list selecting window showing the current setting. You can simply hit the Return key, without selecting any item, -- or click the Cancel button -- to not change the setting.
- If you select an item and hit the Return key, a new dialog will ask you to enter the value you want for the selected item (see below for possible values for each item, and some explanation).
- When you click the OK button, another dialog will ask you: Have you finished your changes? -- If you press Finished, a confirming list window appears. Press OK in that window to save the changes, with three buttons, Cancel, Finished and Not yet... (the default button).
- If you press Not yet..., the same list selecting window will appear, asking to select one item, and this will repeat until you press Finished (or Cancel -- in which case, all the changes made will be discarded...).
- If you press Finished, a new list selecting window will appear: it is simply to confirm or not the changes made. You will either press the OK button, to save the changes, or press the Cancel button to discard any changes.
Now, here are some words for each option:
- ignore_case: 0, that is case sensiive, or 1, case non-sensitive search (note that for kanji searches, ignore_case has no meaning).
- recursive: 0, that is the search will be done only on the first level files in the folder dropped on unix_grep.app, or 1, that is the search will be done in all the files in nested folders in the folder dropped on the application.
- ext: extension of the files to be searched. It can be for example txt, html, xml, or pl [for Perl source code files], etc., or '*'. The last one, '*', means all the extensions. Note that the search will not be done if the files have no extension at all. It is *possible* to search in other kinds of files, for example 'doc' files or 'rtf' files, but the result will be totally garbled and meaningless. You should always specify an extension of text files in UTF-8 encoding (with preferably the Unix line ending characters).
- default_app: This can be either TextWrangler or Jedit X. Jedit X will behave exactly the same way as TextWrangler, although Jedit X is slower to open large files. If you don't have Jedit X, and you set the option default_app to it, the application will quit, with a warning (but I could not test this situation...). -- It seems that Jedit X fails sometimes to open the target file. In such cases, I would recommend to use rather TextWrangler...
Latest note added: -- I think I could fix this problem...
The two options, save_symlink, and add_to_symlink, are somehow special, and need to be explained. I use egrep as the search engine for my application, which can perform 'OR' search.
For example, if you want to search for lines which contain '尸棄' OR '光明' in T09, you would...:
- Drag & drop the folder T09 onto the icon of unix_grep.app
- Type '尸棄|光明' in the dialog asking you to enter the term to search, and you will get a list of 1253 matched lines, which contain either '尸棄' or '光明', or both at the same time. It is the operator '|' which means 'OR' search.
But it is impossible to do 'AND' search with grep or egrep. For example, you might want to find out files which contain both '尸棄' AND '光明'; this is impossible with a simple grep or egrep search. To achieve this goal, you have first to find out files containing (for example) '尸棄'; then find those containing the word '光明' in the found files. This is for such cases that the save_symlink option can be useful.
- First, you will set the option save_symlink to 1 (double-click on unix_grep.app, select the save_symlink option, enter 1, press Finished, press OK...); then
- You will drag and drop the same T09 onto the icon of unix_grep.app
- Type (for example) '尸棄' in the first dialog.
- You will see a list window showing 10 lines matching the word '尸棄' in T09; the title of the window will display:
Save_symlink mode: Found 10 match(es) in 3 file(s)...
and the Prompt of the window will say:Press OK to save the symlink files (existing symlink file[s] will be deleted...)
-- So, there are only 3 files in T09 in which the word '尸棄' occurs. - Hitting the Return key, you will save the symbolic linked files of the matched files in your folder grep_symlinks_folder (selecting an item in the list has no meaning in Save_symlink mode!).
Opening the grep_symlinks_folder, you will find 3 files, named T09n0262.txt, T09n0264.txt, and T09n0278.txt -- each of them having a little arrow at the lower left corner of the icon, indicating that they are symbolic linked files (a symbolic linked file is a kind of alias files used in Unix; i is very little in size [only 4 KB each]; double-clicking on its icon will open the original file linked to it). - Now, set the option save_symlink to 0;
- Drag and drop the folder grep_symlinks_folder onto unix_grep.app
- Enter the word '光明' in the first dialog;
- You will get a list of 1085 matched lines...
This means that the 'OR' search is extensive , while 'AND' search is restrictive.
Now, for the other option, that is add_to_symlink:
This option is meaningful only when the option save_symlink is set to 1. If the option add_to_symlink is set to 0, all the symbolic linked files that are in the folder grep_symlinks_folderwill be deleted at each search in the Save_symlink mode, but it you set this option to 1, the symbolic linked files that are already in the grep_symlinks_folder will not be deleted.
This can be useful when you want to gather symbolic linked files satisfying some condition from one search session to another (with the previous version, which accepted only one folder, this was more crucial...).
For example, you have gathered in the previous example symbolic linked files containing the word '尸棄' that were in the folder T09. If you want to add to these files symbolic linked files satisfying the same condition from the folder T10, you will set the option add_to_symlink to 1, and drop the folder T10 on unix_grep.app, and perform the same search. You will get then 3 more files in the grep_symlinks_folder: T10n0279.txt, T10n0293.txt and T10n0294.txt. You can do any other searches on these files if you drop the grep_symlinks_folder onto unix_grep.app (you probably should set the options save_symlink and add_to_symlink to 0).
The last option, special_scripts, was explained above, in the 'Notes on the new version 0.7.1'.
If you want to perform searches in Arabic (or certainly Hebrew or probably Devanagari or other languages/scripts using ligatures), you have to set this option to 1. The list selecting window will be skipped, and the search result file will be opened directly in your default application. You will have to select one line of this result file, and run the script open_file_fromGrepRes, to open the target file, and select the target term. -- This is due to a bug in AppleScript, and this was the only way I could work around it.
Note that you can set this optionto 1 for other languages/scripts, if you don't need the list selecting window. It is certainly faster, and more robust than when using the list selecting window. So, if you think that the search result will be very large, it will be probably better to use this option.
Supplementary notes:
A. You can use theBest Grep Tool For Os X Catalina
recursive option to search files in nested folders inside one folder. For example, if you have a folder named 'Taisho', in which you have folders such as T01, T02,Best Grep Tool For Os X Os
T03... T85, you can search all the files in these sub-folders with the option recursive set to 1 (a search for the term '摩訶迦羅天' in all the CBETA Taisho files -- which finds 10 matched lines -- takes less than one minute on my machine, a now rather slow PowerPC G4 Dual 867 MHz. The time needed for the search seems to depend more on the number of hits than the number of files to be parsed...).You can drop also more than one folder or file onto unix_grep.app. But you can perform more sophisticated searches if you use symlinked folders, and for that, you can use my another utility, named make_symlink.app that you will find in my page Make Symlink. For example, you can do something like the following:
- Make a new empty folder where you want, and name it, for example, 'agama';
- Locate your folders T01 and T02, and drag and drop them onto the icon of make_symlink.app;
- A folder choosing dialog will ask you to select the folder you want: you would select the newly created folder 'agama'.
You can use the same technique to perform other kind of searches: for example, you would locate all the files whose translator is 鳩摩羅什, gather symbolic linked files of these files in a folder named 'translations_kumarajiva', and search terms in these files, etc.
B. I would recommend to verify the setting of unix_grep.app before each time you want to use it. To do this, double-click on its icon; you will see the list selecting window showing the current setting. You can only hit the Return key if you are sasitfied with the setting; or you will select one item, to change the setting(s)...
C. You should learn also how egrep works, and what wildcard characters can be used. Please have a look at (for example):
http://www.wellho.net/regex/grep.html
D. Due to a bug in AppleScript's droplet mechanism, file or folder names in Arabic (or other 'special' languages) will not be recognized. In such cases, the best is simply to change these file/folder names into Roman names. But the search itself can be done if you put your files/folders with 'special' language names in a folder with a Roman name. You can put symlinked files/folders in a folder with a Roman name as well (don't forget to set the option recursive to 1 if the text files are inside sub-folders...).
E. A final note of warning: I think unix_grep.app is rather robust, but it is a simple AppleScript utility : you should NEVER search for words which may occur more than one or two thousands times. For example, NEVER try to search for '佛' in all the Taisho canon! That would crash certainly the application, and perhaps even the system!!
Best Grep Tool For Os X Download
Download
Please download the package from this link(171K to download).
I would appreciate any feedback, comments, bug reports or requests.
Thank you!
Go to NI Home Page
program'. If that doesn't help, it's probably because you're wondering what a
regular expression ('re' or 'regex') is. Basically, it's a pattern used to describe
a string of characters, and if you want to know aaaaaaall about them, I highly
recommend reading Mastering Regular Expressions by Jeffrey Friedl and
published by Unix über-publisher O'Reilly & Associates.
Best Grep Tool For Os X Operating System
Regexes (regices, regexen, ...the pluralization is a matter of debate) are an extremely
useful tool for any kind of text processing. Searching for patterns with grep is
most people's first exposure to them, as like the article says, you can use them to search
for a literal pattern within any number of text files on your computer. The cool thing is
that it doesn't have to be a literal pattern, but can be as complex as you'd like.
The key to this is understanding that certain characters are 'metacharacters', which have
special meaning for the regex-using program. For example, a plus character (+) tells the
program to match one or more instances of whatever immediately precedes it, while parentheses
serve to treat whatever is contained as a unit. Thus, 'ha+' matches 'ha', but it also matches
'haa' and 'haaaaaaaaaaa', but not 'hahaha'. If you want to match the word 'ha', you can use
'(ha)+' to match one or more instances of it, such as 'hahaha' and 'hahahahahahahahaha'.
Using a vertical bar allows alternate matching, so '(ha|ho)+' matches 'hohoho', 'hahaha', and
'hahohahohohohaha'. Etc.
There are many of these metacharacters to keep in mind. Inside brackets ([]), a carat (^)
means that you don't want to match whatever follows inside the brackets. For Magritte
fans, '[^(a cigar)]' matches any text that is not 'a cigar'. The rest of the time, the carat tells
the program to match only at the beginning of a line, while a dollar sign ($) matches only at
the end. Therefore, '^everything$' matches the word 'everything' only when it is on a line all
by itself and '^[^(anything else)]' matches all lines that do not begin with 'anything else'.
The period (.) matches any character at all, and the asterisk (*) matches zero or more times.
Compare this to the plus, which matches one or more times -- a subtle but important
difference. A lot of regular expressions look for '.*', which is zero or more of anything
(that is, anything at all). This is useful when searching for two things that might or might
not have anything else (that you probably don't care about) between them: 'foo.*bar' will match
on 'foobar', 'foo bar' & 'foo boo a wop bop a lop bam boo bar'. Changing the previous example
to a plus, 'foo.+bar', requires that anything -- come between foo and bar, but it doesn't matter
what, so 'foobar' doesn't match but the other two examples given do match.
For details, try the man pages -- 'man grep'. There are a lot of different versions of the
program, so details may vary. All of this should be valid for OSX though.
Confusing? Maybe, but regular expressions aren't that bad when you get used to them, and
they can be a very useful tool to take advantage of it you know what you're doing. An example.
Let's say you have an website stored on your computer as a series of html documents.
As a cutting edge developer, you've seen the CSS light and want to delete all the
tags wherever they're just saying e.g. face='sans-serif' &/or size='12', because the
stylesheet can now do that for you. On the other hand, it's possible that the patterns
'face='sans-serif' or 'size='12' could show up in normal text (though admittedly
that's unlikely). In fact, what you really want to know is wherever those patterns show up in
a font tag, but you don't care about anywhere else that they might appear. Here's one way to
find that pattern:
This does a number of things. The -i tells grep to ignore case (otherwise it's case sensitive,
and won't match 'FONT' if you're looking for 'font' or 'Font'). The -r tells it to recursively
descend through the directories from wherever the command starts -- in this case, all htm and
html files in the current directory. Everything in single quotes is the pattern we're matching.
We tell grep to match on any text that starts with ' (thus staying within the font tag), and then either the face or
size definition that we're interested in. The one glitch here is that line breaks can break
things, though there are various ways around that. Finding them is left as the proverbial
exercise for the reader. :)
The next question is, what do you want to do with this information you've come up with?
Presumably you want to edit those files in order to fix them, right? With that in mind, maybe
it would be useful to just make a list of matches. Grep normally outputs all the lines that
match the pattern, but if you just want the filenames, use the -l switch. If you want to save
the results into a file, redirect the output of the command accordingly. With those changes,
we now have:
Great. But we can do better still. If you are comforable with the vi editor, you can call vi
with that command directly. The trick is to wrap the command in backticks (`). This is a cool
little Unix trick that runs the contained command & returns the result for whatever you want
to do with it. Thus you can simply put this command:
The result of this command, as far as your tcsh shell is concerned, is something along the lines
of
etc. The beautiful thing here is that if you quit vi & re-run the command later, it will be
able to effectively 'pick up where you left off', since files you've already edited will
presumably no longer match the grep command.
And if you want to get really ambitious, you can use these techniques in ways that
allow you to do all your editing directly from the command line, without having to go into an
interactive editor such as vi or emacs or whatever. If you make it this far in your experiments,
then the next step is to learn to filter the results of a match and process the filtered data
in some way, using tools such as sed, awk, and perl. Using these tools, you can find all
instances of the pattern in question, break it down however you like, substitute or shuffle the
parts around however you like, and then build it all back up again. This is fun stuff! By this
point, you're getting pretty heavily into Unix arcana, and the best book that I've seen about
these tricks is O'Reilly's Unix Power Tools, by various authors. If you really want to leverage
the power of the tools that all Unixes come with, including OSX, then this is a great place to
both start & end up. There's plenty of material in there to keep you busy for months & years...